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Progress  Summary 

A  spatio-temporal  system  for  recognizing  handprint  digit  strings  was  designed  and  trained  to  recognize 
handprinted  ZIP  codes.  The  results  of  our  work  on  a  biologically  motivated  model  of  reflexive  reasoning 
were  used  to  implement  a  pilot  system  for  performing  rapid  reasoning  using  very  large  knowledge  bases.  The 
pilot  system  which  runs  on  a  32  node  CM-5,  can  encode  over  300,000  items  and  respond  in  less  than  500 
msec,  to  queries  requiring  reasoning  upto  a  depth  of  eight. 

Progress  Report 

We  have  continued  our  investigation  of  the  representational  capabilities  of  spatio-temporal  networks  and 
their  application  to  reflexive  reasoning  and  pattern  recognition.  These  network  use  recurrent  connections 
and  variable  delay  links.  In  addition  to  the  firing  rate,  the  firing  time  of  cells  relative  to  other  cells,  carries 
representational  significance  in  these  modeb  (the  S)ntichronous  firing  of  cells  being  an  important  special  case). 

We  finished  the  design  of  a  spatio-temporal  model  for  handprint  digit  string  recognition.  The  model 
was  trained  to  recognize  handprinted  ZIP  codes.  In  addition  to  the  obvious  practical  significance,  the  work 
furthers  our  understanding  of  spatiotemporal  models  for  pattern  recognition  and  demonstrates  that  the 
approach  offers  a  natural  solution  to  the  problem  of  shift-invariance,  enables  a  pattern  recognition  system 
to  handle  arbitrarily  long  inputs  and  partially  solves  the  segmentation/recognition  dilemma.  In  earlier  work 
we  had  developed  a  system  for  isolated  digit  recognition  and  done  some  preliminary  work  on  extending  the 
system  to  connected  pairs  of  digits.  The  additional  work  extended  the  S3rstem  to  do  full  word  (ZIP  code) 
recognition.  The  results  of  this  work  are  described  in  an  article  submitted  to  the  journal  Connection  Science 
for  publication  and  was  the  subject  of  Thomas  Fontaine’s  PhD  dissertation  (December  1993). 

We  are  also  leveraging  the  results  of  our  reflexive  reasoning  system  based  on  temporal  synchrony  to  build 
a  system  for  performing  rapid  reasoning  using  very  large  knowledge  bases.  The  aim  is  to  build  a  system 
whose  response  time  is  fast  enough  to  support  inferencing  for  a  real-time  speech  undorstanding  system.  This 
means  being  able  to  respond  to  retrieval  as  well  as  inferential  queries  within  a  few  hundred  milliseconds.  We 
have  a  pilot  implementation  on  a  32  node  CM-5  that  can  encode  over  300,000  rules,  facts,  and  types  and 
respond  to  queries  whose  response  requires  inferences  that  are  10  deep  in  about  500msec.  The  effectiveness 
of  the  implementation  can  be  directly  attributed  to  the  constraints  on  representation  and  inference  suggested 
by  the  use  of  temporal  synchrony  for  expressing  dynamic  bindings.  The  result  of  the  CM-5  implementation 
are  described  in  the  enclosed  technical  report  (ICSI  TR-94-031).  This  research  is  the  topic  of  D.R.  Mani’s 
PhD  dissertation. 

Results  of  work  on  our  model  for  reflexive  reasoning  using  temporal  synchrony  have  iq>peared  in  Bdiavioral 
and  Brain  Sciences  and  Connection  Science  journals  and  in  the  199S  International  Joint  Conference  on 
Artificial  Intelligence. 
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I  have  begun  investigating  a  possible  solution  to  the  “catastrophic  interference  problem” .  In  brief,  the 
problem  is  this:  If  a  network  that  has  already  been  trained  to  solve  task  A  is  trained  to  solve  task  B,  it 
forgets  the  solution  to  task  A  unless  it  is  simultaneously  retrained  on  task  A.  This  problem  is  an  inherent 
weakness  of  most  incremental  learning  algorithms  and  is  perhaps  the  biggest  impediment  in  the  development 
of  scalable  learning  s3rstems.  The  solution  being  investigated  is  as  follows:  Initially  the  system  focuses  on 
a  small  number  of  categories.  After  it  learns  these  categories,  it  tries  to  identify  which  features  formed 
in  the  “hidden  layer”  play  a  crucial  role  in  the  recognition  of  these  categories.  The  system  freezes  these 
crucial  features  and  as  a  result  they  cannot  be  obliterated  during  subsequent  learning  (although  they  may 
undergo  some  fine  tuning).  These  frozen  features  are  however,  available  to  other  structures  that  are  learned 
subsequently  to  recognize  other  categories.  An  important  claim  is  that  the  set  of  features  will  gradually 
stabilize  and  learning  new  categories  will  get  progressively  easier  and  involve  combining  existing  features  in 
the  appropriate  manner.  These  ideas  are  being  investigated  in  the  context  of  training  using  spatio-tempord 
to  recognize  digit  strings. 
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Abstract:  Human  agents  draw  a  variety  of  inferences  effortlessly,  spontaneously,  and  with  remarkable  efficiency  -  as  though  these 
inferences  were  a  reflexive  response  of  their  cognitive  apparatus.  Furthermore,  these  inferences  are  drawn  with  reference  to  a  large 
body  of  background  knowledge.  This  remarkable  human  ability  seems  paradoxical  given  the  complexity  of  reasoning  reported  by 
researchers  in  artificial  intelligence.  It  also  poses  a  challenge  for  cognitive  science  and  computational  neuroscience;  How  can  a  system 
of  simple  and  slow  neuronlike  elements  represent  a  large  body  of  systemic  knowledge  and  perform  a  range  of  inferences  with  such 
speed?  We  describe  a  computational  model  that  takes  a  step  toward  addressing  the  cognitive  science  challenge  and  resolving  the 
artifidai  intelligence  paradox.  We  show  how  a  connectionist  network  can  encode  millions  of  facts  and  rules  involving  n-ary  predicates 
and  variables  and  perform  a  class  of  inferences  in  a  few  hundred  milliseconds.  Efficient  reasoning  requires  the  rapid  representation 
and  propagation  of  dynamic  bindings.  Our  model  (which  we  refer  to  as  SHRUTl)  achieves  this  by  representing  (1)  dynamic  bindings  as 
the  synchronous  firing  of  appropriate  nodes,  (2)  rules  as  interconnection  patterns  that  direct  the  propagation  of  rhythmic  activity,  and 
(3)  long-term  facts  as  temporal  pattern-matching  subnetworks.  The  model  is  consistent  with  recent  neurophysiological  evidence  that 
syndironous  activity  occurs  in  the  brain  and  may  play  a  representational  role  in  neural  information  processing.  The  model  also  makes 
specific  psychologically  significant  predictions  about  the  nature  of  reflexive  reasoning.  It  identifies  constraints  on  the  form  of  rules 
that  may  participate  in  such  reasoning  and  relates  the  capacity  of  the  working  memory  underlying  reflexive  reasoning  to  biological 
parameters  such  as  the  lowest  frequency  at  which  nodes  can  sustain  synchronous  oscillations  and  the  coarseness  of  synchronization. 

Keywords:  binding  problem;  connectionism;  knowledge  representation;  long-term  memory;  neural  oscillations;  reasoning;  short¬ 
term  memory;  systematicity;  temporal  synchrony;  working  memory 


1.  introduction 

The  ability  to  represent  and  reason  with  a  large  body  of 
knowledge  in  an  effective  and  systematic  manner  is  a 
central  characteristic  of  cognition.  This  is  home  out  by 
research  on  artificial  intelligence  and  cognitive  science, 
which  suggests  that  reasoning  underlies  even  the  most 
commonplace  intelligent  behavior.  For  example,  lan¬ 
guage  understanding,  a  task  we  usually  perform  rapidly 
and  effortlessly,  depends  upon  our  ability  to  make  predic¬ 
tions,  generate  explanations,  and  recognize  speakers’ 
plans. '  To  appreciate  the  richness  and  speed  of  human 
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reasoning,  consider  the  following  example  derived  from 
Schubert  (1989).  Imagine  a  person  reading  a  variation  of 
the  Little  Red  Riding  Hood  (LRRH)  story,  in  which  the 
wolf  intends  to  eat  LRRH  in  the  woods.  'iTie  reader  is  at 
the  point  in  the  story  where  the  wolf,  who  has  followed 
LRRH  into  the  woods,  is  about  to  attack  her.  The  next 
sentence  reads:  'The  wolf  heard  some  woodcutters 
nearby  and  so  he  decided  to  wait.”  It  seems  reasonable  to 
claim  that  the  reader  will  understand  this  sentence  spon¬ 
taneously  and  without  conscious  effort.  However,  a  care¬ 
ful  analysis  suggests  that  even  though  the  reader  remains 
unaware  of  it,  understanding  this  sentence  requires  a 
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chain  of  reasoning  that  may  be  described  informally  as 
follows  (parenthetical  text  identifies  the  background 
knowledge  that  might  mediate  the  reasoning  process): 
The  wolf  will  approach  LRRH  (to  eat  something  you 
have  to  be  near  it);  LRRH  will  scream  (because  a  child  is 
scared  by  an  approaching  wild  animal);  upon  hearing 
the  scream  the  woodcutters  will  know  that  a  child  is  in 
danger  (because  a  child's  screaming  suggests  that  she  is 
in  danger);  the  woodcutters  will  go  to  the  child  (people 
want  to  protect  children  in  danger,  and  in  part  this 
involves  determining  the  source  of  the  danger);  the 
woodcutters  will  try  to  prevent  the  wolf  from  attacking 
LRRH  (people  want  to  protect  children);  in  doing  so 
the  woodcutters  may  hurt  the  wolf  (preventing  an 
animal  from  attacking  may  involve  physical  force);  so 
the  wolf  decides  to  wait  (because  an  animal  does  not 
want  to  get  hurt). 

One  could  argue  that  some  of  the  steps  in  this  reasoning 
process  are  precompiled  or  “chunked,"  but  it  would  be 
unreasonable  to  claim  that  the  entire  chain  of  reasoning 
can  be  construed  as  direct  retrieval  or  even  a  single-step 
inference.  Hence,  in  addition  lo  accessing  lexical  items, 
parsing,  and  resolving  anaphoric  reference,  some  compu¬ 
tation  similar  to  the  above  chain  of  reasoning  must  occur 
when  the  sentence  in  question  is  processed.  As  another 
example,  consider  the  sentence  “John  seems  to  have 
suicidal  tendencies;  he  has  joined  the  Colombian  drug 
enforcement  agency. ’’  In  spite  of  its  being  novel,  we  can 
understand  the  sentence  spontaneously  and  without  con¬ 
scious  effort.  This  sentence,  however,  could  not  have 
been  understood  without  using  background  knowledge 
and  dynamically  inferring  that  joining  the  Colombian 
drug  enforcement  agency  has  dangerous  consequences, 
and  since  John  probably  knows  this,  his  decision  to  join 
the  agency  suggests  that  he  has  suicidal  tendencies. 

As  the  above  examples  suggest,  we  can  draw  a  variety  of 
inferences  rapidly,  spontaneously,  and  without  conscious 
effort  -  as  though  they  were  a  reflexive  response  of  our 
cognitive  apparatus.  Let  us  accordingly  describe  such 
reasoning  as  re/Zewoe  (Shastri  1990).*  Reflexive  reasoning 
may  be  contrasted  with  reflective  reasoning,  which  re¬ 
quires  reflection,  conscious  deliberation,  and  often  an 
overt  consideration  of  alternatives  and  weighing  of  possi¬ 
bilities.  Reflective  reasoning  takes  longer  and  often  re¬ 
quires  the  use  of  external  props  such  as  a  paper  and 
pencil.  Examples  of  such  reasoning  are  solving  logic 
puzzles,  doing  cryptarithmetic,  or  planning  a  vacation.^ 
Our  remarkable  ability  to  perform  reflexive  reasoning 
poses  a  challenge  for  cognitive  science  and  neuroscience: 
How  can  a  system  of  simple  and  slow  neuronlike  elements 
represent  a  large  body  of  systematic  knowledge  and 
perform  a  range  of  inferences  with  such  speed?  With 
nearly  10**  computing  elements  and  10*®  interconnec¬ 
tions,  the  brain’s  capacity  for  encoding,  communicating, 
and  processing  information  seems  overwhelming.  But  if 
the  brain  is  extremely  powerful,  it  is  also  extremely 
limited:  First,  neurons  are  slow  computing  devices. 
Second,  they  communicate  relatively  simple  messages 
that  can  encode  only  a  few  bits  of  information.  Hence  a 
neuron’s  output  cannot  encode  names,  pointers,  or  com¬ 
plex  structures. Finally,  the  computation  performed  by  a 
neuron  is  best  described  as  an  analog  spatio-temporal 
integration  of  its  inputs.  The  relative  simplicity  of  a 
neuron’s  processing  ability  with  reference  to  the  needs  of 


symbolic  computation,  and  the  restriction  on  the  cxim- 
plexity  of  messages  exchanged  by  neurons,  impose  strong 
constraints  on  the  nature  of  neural  representations  and 
processes  (Feldman  1989;  Feldman  &  Ballard  1982;  Shas¬ 
tri  1991).  [See  also  Feldman:  "Four  frames  suffice:  A 
provisional  model  of  vision  and  space  ”  BBS  8(2)  1985; 
Ballard:  “Cortical  connections  and  parallel  processing: 
Structure  and  function”  BBS  9(1)  19^.]  As  we  discuss  in 
section  2,  a  reasoning  system  must  be  capable  of  encoding 
systematic  and  abstract  knowledge  and  instantiating  it  in 
specific  situations  to  draw  appropriate  inferences.  This 
means  that  the  system  must  solve  a  complex  version  of  the 
variable-binding  problem  (see  Section  2  and  Feldman 
1982;  von  der  Malsburg  1986).  In  particular,  the  system 
must  be  capable  of  representing  composite  structures  in  a 
dynamic  fashion  and  systematically  propagating  them  to 
instantiate  other  composite  structures.  This  turns  out  to 
be  a  difficult  problem  for  neurally  motivated  models.  As 
McCarthy  (1988)  observed,  most  connectionist  systems 
suffer  from  the  “unary  or  even  propositional  fixation"  with 
their  representational  power  restricted  to  unary  predi¬ 
cates  applied  to  a  fixed  object.  Fodor  and  Pylyshyn 
(1988a)  have  even  questioned  the  ability  of  connectionist 
networks  to  embody  systematieity  and  compositionality. 

1.1.  Reflexive  reasoning:  Some  assumptions, 

observations  and  hypotheses 

Reflexive  reasoning  occurs  with  reference  to  a  large  body 
of  long-term  knowledge.  This  knowledge  forms  an  inte¬ 
gral  part  of  an  agent’s  conceptual  representation  and  is 
retained  for  a  considerable  period  of  time  once  it  is 
acquired.  We  wish  to  distinguish  long-term  knowledge 
from  short-term  as  well  as  medium-term  knowledge.  By 
the  last  we  mean  knowledge  that  persists  longer  than 
short-term  knowledge  and  may  be  remembered  for  days 
or  even  weeks.  Such  medium-term  knowledge,  however, 
may  be  forgotten  without  being  integrated  into  the  agent’s 
long-term  conceptual  representation.  The  distinction  be¬ 
tween  medium-  and  long-term  knowledge  is  not  arbitrary 
and  seems  to  have  a  neurological  basis.  It  has  been 
suggested  that  medium-term  memories  are  encoded  via 
long-term  potentiation  (LTP)  (Lynch  1986),  and  some  of 
them  subsequently  converted  into  long-term  memories 
and  encoded  via  essentially  permanent  structural  changes 
(see,  e.g.,  Marr  1971;  Squire  1987;  Squire  &  Zola-Morgan 
1991). 

An  agent’s  long-term  knowledge  base  (LTKB)  encodes 
several  kinds  of  knowledge.  These  include  specific  knowl¬ 
edge  about  particular  entities,  relations,  events,  and  situ¬ 
ations,  and  general  s\  stematic  knowledge  about  the  regu¬ 
larities  and  dependencies  in  the  agent’s  environment.  For 
example,  an  agent  s  LTKB  may  contain  specific  knowl¬ 
edge  such  as  "Paris  is  the  capital  of  France  ”  and  “Susan 
bought  a  Rolls-Royce,”  as  well  as  systematic  and 
instantiation-independent  knowledge  such  as  “if  one 
buys  .something  then  one  owns  it.”  We  will  refer  to 
specific  knowledge  as  facts,  and  general  instantiation- 
independent  knowledge  as  rules  (note  that  by  a  rule  we  do 
not  mean  a  “rule  of  inference”  such  as  modus  ponens).  The 
LTKB  may  also  include  knowledge  about  the  attributes  of 
features  of  c'oneepts  and  the  superordinate/subordinate 
relations  among  concepts,  and  also  procedural  knowledge 
such  as  “how  to  mow  a  lawn.  ” 
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In  discussing  the  LTKB  we  are  focusing  on  representa¬ 
tional  adequacy,  that  is,  the  need  to  represent  entities, 
relations,  inferential  dependencies,  and  specific  as  well  as 
general  knowledge.  The  expressiveness  implied  by  this 
generic  specification,  however,  is  sufficient  to  represent 
knowledge  structures  such  as  frames  (Minsky  1975), 
scripts  (Schank  &  Abelson  1977),  and  productions  or  if- 
then  rules  (Newell  &  Simon  1972). 

A  serious  attempt  at  compiling  commonsense  knowl¬ 
edge  suggests  that  the  LTKB  may  contain  as  many  as  10^ 
items  (Cuba  &  Lenat  1990).  This  should  not  be  very 
surprising  given  that  it  must  include,  besides  other 
things,  our  knowledge  of  naive  physics  and  naive  psychol¬ 
ogy;  facts  about  ourselves,  our  family,  and  friends;  fu^ts 
about  history  and  geography;  our  knowledge  of  artifacts; 
sports,  art,  and  music  trivia;  and  our  models  of  social  and 
civic  interactions. 

1.1.1.  Space  and  time  constraints  on  a  reflexive  reasoner. 

Given  that  there  are  about  lO^^  jn  the  brain,  the 
expected  size  of  the  LTKB  (10*)  rules  out  any  encoding 
scheme  whose  node  requirement  is  quadratic  (or  higher) 
in  the  size  of  the  LTKB.®  In  view  of  this  we  adopt  the 
working  hypothesis  that  the  node  requirement  of  a  model 
of  reflexive  reasoning  should  be  no  more  than  linear  in 
(i.e.,  proportional  to)  the  size  of  the  LTKB.  This  is  a 
reasonable  hypothesis.  Observe  that  (1)  a  node  in  an 
idealized  computational  model  may  easily  correspond  to  a 
hundred  or  so  actual  cells,  and  (2)  the  number  of  cells 
available  for  encoding  the  LTKB  can  only  be  a  fraction  of 
the  total  number  of  cells. 

We  believe  that  although  the  size  of  an  agents  LTKB 
increases  considerably  from,  say,  age  10  to  30,  the  time 
taken  by  an  agent  to  understand  natural  language  does 
not.  This  leads  us  to  suspect  that  the  time  taken  by  an 
episode  of  reflexive  reasoning  does  not  depend  on  the 
overall  size  of  the  LTKB  but  only  on  the  complexity  of 
the  particular  episode  of  reasoning.  Hence  we  adopt  the 
working  hypothesis  that  the  time  required  to  perform 
reflexive  reasoning  is  independent  of  the  size  of  the 
LTKB.^ 

The  independence  of  (1)  the  time  taken  by  reflexive 
reasoning  and  (2)  the  size  of  the  LTKB  implies  that 
reflexive  reasoning  is  a  parallel  process  and  involves  the 
simultaneous  exploration  of  a  number  of  inferential  paths. 
Hence,  a  model  of  reflexive  reasoning  must  be  parallel  at 
the  level  of  rule  application  and  reasoning,  that  is,  it  must 
support  knowledge-level  parallelism.  This  is  a  critical 
constraint  and  one  that  is  not  necessarily  satisfied  by  a 
connectionist  model  simply  because  it  is  “connectionist” 
(see  also  Sumida  &  Dyer  1989). 

We  understand  written  language  at  the  rate  of  some¬ 
where  between  150  and  400  words  per  minute  (Carpenter 
&  Just  1977).  In  other  words,  we  can  understand  a  typical 
sentence  in  a  matter  of  one  to  two  seconds.  Given  that 
reflexive  reasoning  occurs  during  language  understand¬ 
ing,  it  follows  that  episodes  of  reflexive  reasoning  may 
take  as  little  as  a  few  hundred  milliseconds. 

1.1.2.  Reflexive  reasoning  is  limited  reasoning.  Complex¬ 
ity  theory  rules  out  the  existence  of  a  general-purpose 
reasoning  system  that  derives  all  inferences  efficiently. 
This  entails  that  there  must  exist  constraints  on  the  class 
of  rea.soning  that  may  be  performed  in  a  reflexive  manner. 


Not  surprisingly,  cognitive  agents  can  perform  only  a 
limited  class  of  inferences  with  extreme  efiflciency.  Natu¬ 
rally,  we  expect  that  the  representational  and  reasoning 
ability  of  the  proposed  system  will  also  be  constrained  and 
limited  in  a  number  of  ways.  However,  we  would  like  the 
strengths  and  limitations  of  the  system  to  be  psycho¬ 
logically  plausible  and  to  mirror  some  of  the  stren^s  and 
limitations  of  human  reasoning. 

1.2.  Computational  constraints 

Connectionist  models  (Feldman  &  Ballard  1982;  Rumel- 
hart  &  McClelland  1986)  are  intended  to  emulate  the 
information-processing  characteristics  of  the  brain  -  al¬ 
beit  at  an  abstract  computational  level  -  and  to  reflect  its 
strengths  and  weaknesses.  Typically,  a  node  in  a  connec¬ 
tionist  network  corresponds  to  an  idealized  neuron,  and  a 
link  corresponds  to  an  idealized  synaptic  connection.  Let 
us  enumerate  some  core  computational  features  of  con¬ 
nectionist  models:  (1)  Nodes  compute  very  simple  func¬ 
tions  of  their  inputs.  (2)  They  can  only  hold  limited  state 
information  -  while  a  node  may  maintain  a  scalar  “poten¬ 
tial,”  it  cannot  store  and  selectively  manipulate  bit 
strings.  (3)  Node  outputs  do  not  have  sufficient  resolution 
to  encode  symbolic  names  or  pointers.  (4)  There  is  no 
central  controller  that  instructs  individual  nodes  to  per¬ 
form  specific  operations  at  each  step  of  processing. 

1.3.  A  preview 

We  discuss  the  variable-binding  problem  as  it  arises  in  the 
context  of  reasoning  and  describe  a  neurally  plausible 
solution  to  this  problem.  The  solution  involves  maintain¬ 
ing  and  propagating  dynamic  bindings  using  synchronous 
firing  of  appropriate  nodes.  We  show  how  our  solution 
leads  to  a  connectionist  knowledge  representation  and 
reasoning  system  (which  we  call  shhuti,  see  Response, 
Note  1)  that  can  encode  a  large  LTKB  consisting  of  facts 
and  rules  involving  n-ary  predicates  and  variables,  and 
perform  a  broad  class  of  reasoning  with  extreme  effi¬ 
ciency.  Once  a  query  is  posed  to  the  system  by  initializing 
the  activity  of  appropriate  nodes,  the  system  computes  an 
answer  automatically  and  in  time  proportional  to  the 
length  of  the  shortest  chain  of  reasoning  leading  to 
the  conclusion.  The  ability  to  reason  rapidly  is  a  con¬ 
sequence,  in  part,  of  the  system’s  ability  to  maintain 
and  propagate  a  large  number  of  dynamic  bindings 
simultaneously. 

The  view  of  information  processing  implied  by  the 
proposed  system  is  one  where  (1)  reasoning  is  the  tran¬ 
sient  but  systematic  propagation  of  a  rhythmic  pattern  of 
activity,  (2)  each  entity  in  the  dynamic  memory  is  a  phase 
in  this  rhythmic  activity,  (3)  dynamic  bindings  are  repre¬ 
sented  as  the  synchronous  firing  of  appropriate  nodes,  (4) 
long-term  facts  are  subnetworks  that  act  as  temporal 
pattern  matchers,  and  (5)  rules  are  interconnection  pat¬ 
terns  that  cause  the  propagation  and  transformation  of 
rhythmic  patterns  of  activity. 

V*’o  cite  neurophysiological  data  that  suggest  that  the 
basic  mechanisms  proposed  for  representing  and  propa¬ 
gating  dynamic  variable  bindings,  namely,  the  propaga¬ 
tion  of  rhythmic  patterns  of  activity  and  the  synchronous 
activation  of  nodes,  exist  in  the  brain  and  appear  to  play  a 
role  in  the  representation  and  processing  of  information. 
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Our  system  predicts  a  number  of  constraints  on  reflex¬ 
ive  reasoning  that  have  psychological  implications.  These 
predictions  concern  the  capacity  of  the  working;  memory 
underlying  reflexive  reasoning  (WMRR)  and  the  form  of 
rules  that  can  participate  in  such  reasoning.  The  predic¬ 
tions  also  relate  the  capacity  of  the  WMRR  and  the  time  it 
would  take  to  perform  one  step  of  reasoning  to  biological 
parameters  such  as  the  lowest  frequency  at  which  nodes 
can  sustain  synchronous  oscillations,  the  coarseness  of 
synchronization,  and  the  time  it  takes  connected  nodes  to 
synchronize.  By  choosing  biologically  plausible  system 
parameters,  we  show  that  it  is  possible  for  a  system  of 
neuronlike  elements  to  encode  millions  of  facts  and  rules 
and  yet  perform  multistep  inferences  in  a  few  hundred 
milliseconds. 

Reasoning  is  the  spontaneous  and  natural  outcome  of 
the  system’s  behavior.  The  system  does  not  apply  syntac¬ 
tic  rules  of  inference  such  as  modus  ponens.  There  is  no 
separate  interpreter  or  inference  mechanism  that  manip¬ 
ulates  and  rewrites  symbols.  The  network  encoding  of  the 
LTKB  is  best  viewed  as  a  vivid  internal  model  of  the 
agent's  environment,  where  the  interconnections  be¬ 
tween  (internal)  representations  directly  encode  the  de¬ 
pendencies  between  the  associated  (external)  entities. 
When  the  nodes  in  this  model  are  activated  to  reflect  a 
given  state  of  affairs  in  the  environment,  the  model 
spontaneously  simulates  the  behavior  of  the  external 
world  and  in  doing  so  makes  predictions  and  draws 
inferences. 

The  representational  and  inferential  machinery  devel¬ 
oped  in  this  work  has  wider  significance  and  can  be 
applied  to  other  problems  whose  formulation  requires  the 
expressive  power  of  n-ary  predicates,  and  whose  solution 
requires  the  rapid  and  systematic  interaction  between 
long-term  and  dynamic  structures.  Some  examples  of 
such  problems  are  (1)  parsing  and  the  dynamic  linking  of 
syntactic  and  semantic  structures  during  language  pro¬ 
cessing,  and  (2)  model-based  visual  object  recognition 
requiring  the  dynamic  representation  and  analysis  of 
spatial  relations  between  objects  and/or  parts  of  objects. 
Recently,  Henderson  (1992)  has  proposed  the  design  of  a 
natural  language  parser  based  on  our  computational 
model. 


1.4.  Caveats 

Our  primary  concern  has  been  to  extend  the  representa¬ 
tional  and  inferential  power  of  neurally  plausible  (connec- 
tionist)  models  and  to  demonstrate  their  scalability.  We 
are  also  concerned  that  the  strengths  and  limitations  of 
our  system  be  psychologically  plausible.  However,  our 
aim  has  not  been  to  model  data  from  specific  psychologi¬ 
cal  experiments.  What  we  describe  is  a  partial  model  of 
reflexive  reasoning.  It  demonstrates  how  a  range  of  rea¬ 
soning  can  be  performed  in  a  reflexive  manner,  and  it  also 
identifies  certain  types  of  reasoning  that  cannot  Ik*  per¬ 
formed  in  a  reflexive  manner.  Our  system,  however,  does 
not  model  all  aspects  of  reflexive  reasoning.  For  example, 
we  focus  primarily  on  declarative  and  semantic  knowl¬ 
edge  and  tlo  not  iikkIcI  reflexive  analogical  rea.soning.  or 
reflexive  r<-asoning  involving  episodic  memory  (Tulving 
198,3)  and  imagery.  We  do  not  say  much  about  wdiat  the 
actual  contents  ol  an  agent  s  ITKH  ought  to  be,  nor  do  w'e 


provide  a  detailed  answer  to  the  question  of  learning.  We 
do,  however,  discuss  in  brief  how  specific  facts  rnay  be 
learned  and  existing  rules  modified  (sect.  10.6).  Neural 
plausibility  is  an  imiX)rtant  aspect  of  this  work  -  we  show 
that  the  proposed  system  can  be  realized  by  using  neu¬ 
rally  plausible  nodes  and  mechanisms,  and  we  investigate 
the  consequences  of  choosing  biologically  motivated 
values  of  system  parameters.  Needless  to  say,  what  we 
describe  is  an  idealized  computational  model  and  it  is  not 
intended  to  be  a  blueprint  of  how  the  brain  encodes  an 
LTKB  and  performs  reflexive  reasoning. 

1.4.1.  An  outline  of  the  paper.  Section  2  discusses  the 
dynamic-binding  problem  in  the  context  of  reasoning. 
Section  3  presents  our  solution  to  this  problem  and  the 
encoding  of  long-term  rules  and  facts.  Section  4  describes 
a  reasoning  system  capable  of  encoding  an  LTKB  and 
answering  queries  on  the  basis  of  the  encoded  knowledge. 
The  interface  of  the  basic  reasoning  system  with  an  IS-A 
hierarchy  that  represents  entities,  types  (categories),  and 
the  super-/subordinate  concept  relations  between  them 
is  described  in  section  5.  Section  6  discusses  a  solution  to 
the  multiple  instantiation  problem.  Section  7  discusses 
the  biological  plau-ibility  of  our  system  and  identifies 
neurally  plausible  values  of  certain  system  parameters. 
Section  8  points  out  the  psychological  implications  of  the 
constraints  on  reflexive  reasoning  suggested  by  the  sys¬ 
tem.  Section  9  discusses  related  connectionist  models  and 
the  marker-passing  system  netl.  Finally,  section  10  dis¬ 
cusses  some  open  problems  related  to  integrating  the 
proposed  reflexive-reasoning  system  with  an  extended 
cognitive  system.  Certain  portions  of  the  text  are  set  in 
small  type.  These  cover  detailed  technical  material  and 
may  be  skipped  without  loss  of  continuity. 

2.  Reasoning  and  the  dynamic-binding  probiem 

Assume  that  an  agent’s  LTKB  embodies  the  following 
rules:” 

1.  If  someone  gives  a  recipient  an  object  then  the 
recipient  comes  to  own  that  object. 

2.  Owners  can  sell  what  they  own. 

Given  the  alxive  knowledge,  an  agent  would  be  capable  of 
inferring  “Mary  owns  Bookl”  and  “Mary  can  sell  Bookl” 
on  being  told  “John  gave  Mary  Bookl.”  A  connectionist 
reasoning  system  that  embodies  the  same  knowledge 
should  also  be  capable  of  making  similar  inferences  and, 
hence,  exhibiting  the  following  behavior:  If  the  network’s 
pattern  of  activity  is  initialized  to  represent  the  fact  “John 
gave  Mary  Bookl,  ”  then  very  soon  its  activity  should 
evolve  to  include  the  representations  of  the  “Mary  owns 
Bookl"  and  “Mary  can  sell  Bookl.  ” 

Let  us  |X)int  out  that  the  knowledge  embodied  in  a  rule 
may  be  viewed  as  having  two  distinct  aspects.  A  rule 
specifies  a  systematic  correspondence  between  the  argu¬ 
ments  of  certain  “predicates"  (where  a  predicate  ina\-  be 
thought  of  as  a  relation,  a  frame,  or  a  schema).  For  ex¬ 
ample,  rule  (1)  specifies  that  a  “give"  event  leads  to  an 
“own"  event  where  the  recipient  of  “gix’e"  txirresponds  to 
the  owner  of  “own,"  and  the  object  of  “give"  corresponds 
to  the  object  of  “own.  Let  ns  reler  to  this  asiiect  ol  a  rule 
as  siisteniaticitn.^  The  .second  aspect  of  the  knowledge 
embodied  in  a  rnli'  eoneerns  the  ajipropriateness  ol  the 
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specified  argument  correspondence  in  a  given  situation, 
depending  upon  the  types  (or  features)  of  the  argument 
fillers  involved  in  that  situation.  Thus  appropriateness 
may  capture  type  restrictions  that  argument  fillers  must 
satisfy  in  order  for  a  rule  to  fire.  It  may  also  indicate  type 
preferences  and  provide  a  graded  measure  of  a  rule’s 
applicability  in  a  given  situation  on  the  basis  of  the  types 
of  the  argument  fillers  in  that  situation. 

We  will  first  focus  on  the  problems  that  must  be  solved 
in  order  to  incorporate  systematicity  in  a  connectionist 
system.  In  section  5  we  will  discuss  how  the  solutions 
proposed  to  deal  with  systematicity  may  be  augmented 
to  ineorporate  appropriateness  and  represent  context- 
dependent  rules  that  are  sensitive  to  the  types  of  the 
argument  fillers. 

If  we  focus  on  systematicity,  then  rules  can  be  suc¬ 
cinctly  described  by  using  the  notation  offirst-order  logic. 
For  example,  rules  (1)  and  (2)  can  be  expressed  as  the 
following  first-order  rules: 

'^x.y.z  lgive(x,y,z)  own(!/,z)]  (1) 

Vu.o  [otcn(u,o)  ^  can-sell(u,v)]  (2) 

where  gtoe  is  a  three-place  predicate  with  arguments: 
giver,  recipient,  and  give-object;  oum  is  a  two-place  predi¬ 
cate  with  arguments:  owner  and  own-object;  and  can-sell 
is  also  a  two-place  predicate  with  arguments:  potential- 
seller  and  can-sell-object.  The  use  of  quantifiers  and 
variables  allows  the  expression  of  general,  instantiation- 
independent  knowledge  and  helps  in  specifying  the  sys¬ 
tematic  correspondence  between  predicate  arguments.® 
A  feet  may  be  expressed  as  a  predicate  instance  (atomic 
formula).  For  example,  the  fact  “John  gave  Mary  Bookl” 
may  be  expressed  as  give(John,  Mary,  Bookl). 

A  connectionist  network  must  solve  three  technical 
problems  in  order  to  incorporate  systematicity.  We  dis¬ 
cuss  these  problems  in  the  following  three  sections. 

2.1.  Dynamic  representation  of  facts:  Instantiating 
predicates 

A  reflexive-reasoning  system  should  be  capable  of  repre¬ 
senting  facts  in  a  rapid  and  dynamic  fashion.  Observe  that 
the  reasoning  process  generates  inferred  facts  dynam¬ 
ically  and  the  reasoning  system  should  be  capable  of 
representing  these  inferred  facts.  Furthermore,  the  rea¬ 
soning  system  must  interact  with  other  processes  that 
communicate  facts  and  pose  queries  to  it,  and  the  system 
should  be  capable  of  dynamically  representing  such  facts 
and  queries. 

The  dynamic  representation  of  facts  poses  a  problem 
for  standard  connectionist  models.  Consider  the  fact 
give(John,  Mary,  Bookl).  This  fact  cannot  be  represented 
dynamically  by  simply  activating  the  representations  of 
the  arguments  giver,  recipient,  and  give-object,  and  the 
constituents  "John,”  “Mary,”  and  “Bookl.”  Such  a  repre¬ 
sentation  would  suffer  from  cross-talk  and  would  be 
indistinguishable  from  the  representations  of  give{Martj, 
John,  Bookl)  and  gire(Bookl ,  Mary,  John).  The  problem 
is  that  this  fact  -  like  any  other  instantiation  of  an  n-ary 
predicate  -  is  a  composite  structure:  it  does  not  merely 
express  an  as.sociafioii  between  the  constituents  "John,” 
“Mary,  and  “Bookl.  rather  it  expresses  a  specific  rela¬ 
tion  witcixnn  each  constituent  plays  a  distinct  role.  Tims  a 


Shastri  &  Ajjanagadde:  Association  to  reasoning 

feet  is  essentially  a  collection  of  bindings  between  pre¬ 
dicate  arguments  and  fillers.  For  example,  the  feet 
give{John,  Mary,  Bookl)  is  the  collection  of  argument- 
filler  bindings  (giver  =  John,  recipient  =  Mary,  give- 
object  =  Bookl).  Hence  representing  a  dynamic  feet 
amounts  to  representing,  dynamically,  the  appropriate 
bindings  between  predicate  arguments  and  fillers. 

The  dynamic  representation  of  facts  should  also  sup¬ 
port  the  simultaneous  representation  of  multiple  facts 
such  as  giveijohn,  Mary,  Bookl)  and  give(Mary,  John, 
Car3)  without  “creating”  ghost  facts  such  as  give(Mary, 
John,  Bookl). 

2.1.1.  Static  versus  dynamic  bindings.  A  connectionist 
encoding  that  represents  the  bindings  associated  with  the 
fact  giveijohn,  Mary,  Bookl)  without  cross-talk  is  illus¬ 
trated  in  Figure  1  (cf  Shastri  1988b;  Shastri  &  Feldman 
1986).  Each  triangular  binder  node  binds  the  appropriate 
filler  to  the  appropriate  argument  and  the  focal  node 
give-23  provides  the  requisite  grouping  between  the  set 
of  bindings  that  make  up  the  fact.  The  binder  nodes 
become  active  on  receiving  two  inputs  and  thus  serve  to 
retrieve  the  correct  filler,  given  a  fact  and  an  argument 
(and  vice  versa).  Such  a  static  encoding,  using  physically 
interconnected  nodes  and  links  to  represent  argument- 
filler  bindings,  is  suitable  for  representing  stable  and 
long-term  knowledge,  because  the  required  focal  and 
binder  nodes  may  be  learned  (or  recruited)  over  time  in 
order  to  represent  new  but  stable  bindings  of  constitu¬ 
ents.  This  scheme,  however,  is  implausible  for  repre¬ 
senting  bindings  required  to  encode  dynamic  structures 
that  will  arise  during  language  understanding  and  visual 
processing.  Such  dynamic  bindings  may  have  to  be  repre¬ 
sented  very  rapidly  -  within  a  hundred  milliseconds  - 
and  it  is  unlikely  that  there  exist  mechanisms  that  can 
support  widespread  structural  changes  and  growth  of  new 
links  within  such  time  scales.  An  alternative  would  be  to 
assume  that  interconnections  between  all  possible  pairs  of 
arguments  and  fillers  already  exist.  These  links  normally 
remain  “inactive  ”  but  the  appropriate  subset  of  these  links 
becomes  “active”  temporarily  to  represent  dynamic  bind¬ 
ings  (Feldman  1982;  von  der  Malsburg  1986).  This  ap¬ 
proach,  however,  is  also  problematic  because  the  number 
of  all  possible  argument-filler  bindings  is  extremely  large, 
and  having  preexisting  structures  for  representing  all 
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these  bindings  will  require  a  prohibitively  large  num¬ 
ber  of  nodes  and  links.  Techniques  for  representing 
argument-filler  bindings  on  the  basis  of  the  von  Neumann 
architecture  also  pose  difficulties  because  they  require 
communicating  names  or  pointers  of  fillers  to  appropriate 
argument  nodes  and  vice  versa.  As  pointed  out  earlier, 
the  storage  and  processing  capacity  of  nodes  as  well  as  the 
resolution  of  their  outputs  is  not  sufficient  to  store,  pro¬ 
cess,  and  communicate  names  or  pointers. 

2.2.  Inference,  propagetlon  of  dynamic  bindings  and 
the  encoding  of  rules 

The  second  technical  problem  that  a  connectionist  rea¬ 
soning  system  must  solve  concerns  the  dynamic  genera¬ 
tion  .  inferred  facts.  For  example,  starting  with  a  dy¬ 
namic  representation  of  give(John,  Mary,  Bookl),  the 
state  of  network  encoding  rules  (1)  and  (2)  should  evolve 
rapidly  to  include  the  dynamic  representations  of  the 
inferred  facts;  own{Mary,  Bookl)  and  can-sell(Mary, 
Bookl).  This  process  should  also  be  free  of  cross-talk  and 
not  lead  to  spurious  bindings. 

Generating  inferred  facts  involves  the  systematic  prop¬ 
agation  of  dynamic  bindings  in  accordance  with  the  rules 
embodied  in  the  system.  A  rule  specifies  antecedent  and 
consequent  predicates  and  a  correspondence  between 
the  arguments  of  these  predicates.  For  example,  the  rule 
'ix,y,z  [give(x,y,z)  ^  own(y,z)]  specifies  that  a  give  event 
results  in  an  own  event  wherein  the  recipient  of  a  give 
event  corresponds  to  the  owner  of  an  own  event  and  the 
give-object  of  a  give  event  corresponds  to  the  own-object 
of  an  own  event.  An  application  of  a  rule  (i.e.,  a  step  of 
inference)  therefore  amounts  to  taking  an  instance  of  the 
antecedent  predicate(s)  and  creating,  dynamically,  an 
instance  of  the  consequent  predicate,  with  the  argument 
bindings  of  the  latter  being  determined  by  applying  the 
argument  correspondence  specified  in  the  rule  to  the 
argument  bindings  of  the  former.  Thus  the  application  of 
the  rule  Vi,y,z  [give(x,y,z)  own(y,x)],  in  conjunction 
with  an  instance  of  give,  giveijwhn,  Mary,  Bookl),  creates 
an  instance  of  own  with  the  bindings  {owner  =  Mary,  own- 
object  =  Bookl).  These  bindings  constitute  the  inferred 
fact  own{Mary,  Bookl).  Once  the  representation  of  an 
inferred  feet  is  established,  it  may  be  used  in  conjunction 
with  other  domain  rules  to  create  other  inferred  facts. 
Such  a  chain  of  inference  may  lead  to  a  proliferation  of 
inferred  facts  and  the  associated  dynamic  bindings. 

2.3.  Encoding  long-term  facts 

In  addition  to  encoding  domain  rules  such  as  (1)  and  (2),  a 
connectionist  reasoning  system  must  also  be  capable  of 
encoding  facts  in  its  LTKB  and  using  them  during  recall, 
recognition,  query  answering,  and  reasoning.  For  exam¬ 
ple,  we  expect  our  system  to  be  capable  of  encoding  a  fact 
such  as  “John  bought  a  Rolls-Royce”  in  its  LTKB  and  using 
it  to  answer  rapidly  the  query  Did  John  buy  a  Rolls- 
Royce?  We  also  expect  it  to  use  this  fact  in  conjunction 
with  other  knowledge  to  answer  rapidly  queries  such  as 
Does  John  own  a  car?  Observe  that  storing  a  long-term 
fact  would  require  storing  the  associated  bindings  as  a 
static  long-term  structure.  This  structure  should  interact 
with  dynamic  bindings  and  recognize  those  that  match  it. 


2.4.  Dynamic  binding  and  categorization 

As  discussed  at  the  beginning  of  section  2,  the  appro¬ 
priateness  of  a  rule  in  a  specific  situation  may  depend  on 
the  types/features  of  the  argument  fillers  involved  in  that 
situation.  Thus  categorization  plays  a  crucial  role  in  the 
propagation  of  dynamic  bindings  during  reasoning.  Con¬ 
sider  the  rule:  Vx,y  walk-into(x,y)  ^  hurt(x)  (i.e..  If  one 
walks  into  something  then  one  gets  hurt).  As  stated,  the 
rule  only  encodes  systematicity  and  underspecifies  the 
relation  between  “walking  into”  and  “getting  hurt.”  It 
would  fire  even  in  the  situation  “John  walked  into  the 
mist"  and  lead  to  the  inference  “John  got  hurt.”  A  com¬ 
plete  encoding  of  the  knowledge  embodied  in  the  rule 
would  also  specify  th<;  types/features  of  the  argument 
fillers  of  “walk-into”  for  which  the  application  of  this  rule 
would  be  appropriate.  Given  such  an  encoding,  the  prop¬ 
agation  of  binding  from  the  first  argument  of  walk-into  to 
the  argument  of  hurt  will  occur  only  if  the  fillers  of  the 
arguments  of  walk-into  belong  to  the  appropriate  types 
(we  discuss  the  encoding  of  such  rules  in  sect.  5). 

The  use  of  categorization  can  also  prevent  certain  cases 
of  cross-talk  in  the  representation  of  dynamic  facts.  For 
example,  categorization  may  prevent  cross-talk  in  the 
representation  of  buy(Mary,  Bookl)  because  spurious 
versions  of  this  fact  such  as  buy{Bookl,  Mary)  would 
violate  category  restrictions  and,  hence,  would  be  unsta¬ 
ble.  However,  categorization  cannot  in  and  of  itself  solve 
the  dynamic-binding  problem,  because  it  alone  cannot 
enforce  systematicity.  For  example,  categorization  cannot 
determine  that  the  dynamic  fact  giveljohn,  Mary,  Bookl) 
should  result  in  the  inferred  fact  own{Mary,  Bookl)  but 
not  ownijohn,  Bookl). 

2.5.  The  dynamic-binding  problem  In  vision 
and  language 

The  need  for  systematically  dealing  with  composite  ob¬ 
jects  in  a  dynamic  manner  immediately  gives  rise  to  the 
dynamic-binding  problem.  Thus  the  dynamic-binding 
problem  occurs  during  any  cognitive  activity  that  admits 
systematicity  and  compositionality.  Consider  vision.  Vi¬ 
sual  object  recognition  involves  the  rapid  grouping  of 
information  over  the  spatial  extent  of  an  object  and  across 
different  feature  maps  so  that  features  belonging  to  one 
object  are  not  confused  with  those  of  another  (Treisman  & 
Gelade  1980).  The  binding  of  features  during  visual  pro¬ 
cessing  is  similar  to  the  binding  of  argument  fillers  during 
reasoning.  In  terms  of  representational  power,  however, 
the  grouping  of  all  features  belonging  to  the  same  object 
can  be  expressed  using  «nary-predicates,**  but  as  we 
have  seen,  reasoning  requires  the  representation  of  unary 
as  well  as  n-ary  predicates.  A  similar  need  would  arise  in  a 
more  sophisticated  vision  system  that  dynamically  repre¬ 
sents  and  analyzes  spatial  relations  between  objects  or 
parts  of  an  object. 

Although  there  may  be  considerable  disagreement 
over  the  choice  of  primitives  and  the  functional  relation¬ 
ship  between  the  “meaning”  of  a  composite  structure  and 
that  of  its  constituents,  it  seems  apparent  that  a  computa¬ 
tional  model  of  language  should  be  capable  of  computing 
and  representing  composite  structures  in  a  systematic 
and  dynamic  manner.  Thus  language  understanding  re- 
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quires  a  solution  to  the  dynamic-binding  problem,  to 
support  reasoning  as  well  as  syntactic  processing  and  the 
dynamic  linking  of  syntactic  and  semantic  structures. 

3.  Solving  the  dynamic^inding  problem 

In  this  section  we  describe  solutions  to  three  technical 
problems  associated  with  dynamic  bindings  discussed  in 
sections  2.1  through  2.3.  The  solutions  involve  several 
ideas  that  complement  each  other  and  together  lead  to  a 
connectionist  model  of  knowledge  representation  and 
reflexive  reasoning. 

As  pointed  out  in  section  2.1,  it  is  implausible  to 
represent  dynamic  bindings  by  using  structural  changes, 
prewired  interconnection  networks,  or  by  communicat¬ 
ing  names/pointers  of  arguments  and  fillers.  Instead, 
what  is  required  is  a  neurally  plausible  mechanism  for 
rapidly  and  temporarily  labeling  the  representations  of 
fillers  and  predicate  arguments  to  encode  dynamically 
argument-filler  bindings.  Also  required  are  mechanisms 
for  systematically  propagating  such  transient  labels  and 
allowing  them  to  interact  with  long-term  structures. 

In  the  proposed  system  we  use  the  temporal  structure 
of  node  activity  to  provide  the  necessary  labeling.  Specifi¬ 
cally,  we  represent  dynamic  bindings  between  arguments 
and  fillers  by  the  synchronous  firing  of  appropriate  nodes. 
We  also  propose  appropriate  representations  for  n-ary 
predicates,  rules,  long-term  facts,  and  an  IS-A  hierarchy 
that  facilitate  the  efficient  propagation  and  recognition  of 
dynamic  bindings.  *2 

The  significance  of  temporally  organized  neural  activity 
has  long  been  recognized  (Freeman  1981;  Hebb  1949; 
Sejnowski  1981).  In  particular,  von  der  Malsburg  (1981; 
1986)  has  proposed  that  correlated  activity  within  a  group 
of  cells  can  be  used  to  represent  the  dynamic  grouping  of 
cells.  He  also  used  temporal  synchrony  and  synapses  that 
can  alter  their  weights  within  hundreds  of  milliseconds  to 
model  sensory  segmentation  and  the  human  ability  to 
attend  to  a  specific  speaker  in  a  noisy  environment  (von 
der  Malsburg  &  Schneider  1986).  Abeles  (1982;  1991)  has 
put  forth  the  hypothesis  that  computations  in  the  cortex 
occur  via  “synfire  chains”  -  propagation  of  synchronous 
activity  along  diverging  and  converging  pathways  be¬ 
tween  richly  interconnected  cell  assemblies.  Crick  (1984) 
has  also  suggested  that  the  use  of  fine  temporal  coinci¬ 
dence  to  represent  dynamic  bindings  and  synchronized 
activity  across  distant  regions  forms  the  keystone  of  Da- 
masio’s  (1989)  general  framework  for  memory  and  con¬ 
sciousness.  Several  researchers  have  reported  the  occur¬ 
rence  of  synchronous  activity  in  the  cat  and  monkey  visual 
cortex  and  presented  evidence  in  support  of  the  conjec¬ 
ture  that  the  visual  cortex  may  be  using  synchronous 
and/or  oscillatory  activity  to  solve  the  binding  problem 
(see  sect.  7). 

Recently,  other  researchers  have  used  temporal  s>'n- 
chrony  to  solve  various  aspects  of  the  binding  problem  in 
visual  perception  (Horn  et  al.  1991;  Hummel  &  Bieder- 
man  1992;  Strong  &  Whitehead  1989).  In  this  work  we  use 
temporal  synchrony  to  solve  a  different  problem,  namely, 
the  representation  of,  and  systematic  reasoning  with, 
conceptual  knowledge.  In  solving  this  problem  we  also 
demonstrate  that  temporal  synchrony  can  support  more 


complex  representations.  The  expressiveness  and  infer- 
enti^  power  of  our  model  exceed  that  of  the  models  cited 
above,  because  our  system  can  represent  dynamic  instan¬ 
tiations  of  n-ary  predicates,  including  multiple  instantia¬ 
tions  of  the  same  predicate. 

Clossman  (1988)  has  used  synchronous  activity  to  rep¬ 
resent  argument-filler  bindings,  but  he  has  not  suggested 
an  effective  representation  of  "rules”  (and  long-term 
facts).  Consequently,  his  system  could  not  propagate 
dynamic  bindings  to  perform  inferences. 

As  an  abstract  computational  mechanism,  temporal 
synchrony  can  be  related  to  the  notion  of  marker  passing 
(Fahiman  1979;  Quillian  1968).  Fahlman  has  proposed 
the  design  of  a  marker-passing  machine  (netl)  consisting 
of  a  parallel  network  of  simple  processors  and  a  serial 
computer  that  controlled  the  operation  of  the  parallel 
network.  Each  node  could  store  a  small  number  of  dis¬ 
crete  “markers”  (or  tags)  and  each  link  could  propagate 
markers  between  nodes  under  the  supervision  of  the 
network  controller.  Fahlman  showed  how  his  machine 
could  compute  transitive  closure  and  set  intersection  in 
parallel,  and  in  turn,  solve  a  class  of  inheritance  and 
recognition  problems  efficiently.  Fahlman ’s  system,  how¬ 
ever,  was  not  neurally  plausible.  First,  nodes  in  the 
system  were  required  to  store,  match,  and  selectively 
propagate  marker  bits.  Although  units  with  the  appropri¬ 
ate  memory  and  processing  characteristics  may  be  readily 
realized,  using  electronic  hardware,  they  do  not  have  any 
direct  neural  analog.  Second,  the  marker-passing  system 
operated  under  the  strict  control  of  a  serial  computer  that 
specified,  “at  every  step  of  the  propagation,  exactly  which 
types  of  links  were  to  pass  which  markers  in  which 
directions”  (Fahlman  1979). 

The  relation  between  marker  passing  and  temporal 
synchrony  can  be  recognized  by  noting  that  nodes  firing 
in  synchrony  may  be  viewed  as  being  marked  with  the 
same  marker,  and  the  propagation  of  synchronous  activity 
along  a  chain  of  connected  nodes  can  be  viewed  as  the 
propagation  of  markers.  Thus,  in  developing  our  reason¬ 
ing  system  using  temporal  synchrony  we  have  also  estab¬ 
lished  that  marker-passing  systems  can  be  realized  in  a 
neurally  plausible  manner.  In  the  proposed  system,  noth¬ 
ing  has  to  be  stored  at  a  node  in  order  to  mark  it  with  a 
marker.  Instead,  the  time  of  firing  of  a  node  relative  to 
other  nodes  and  the  coincidence  between  the  time  of 
firing  of  a  node  and  that  of  other  nodes  has  the  effect  of 
marking  a  node  with  a  particular  marker!  A  node  in  our 
system  is  not  required  to  match  &ny  markers,  it  simply  has 
to  detect  whether  appropriate  inputs  are  coincident.  Our 
approach  enables  us  to  realize  the  abstract  notion  of 
markers  by  using  time,  a  dimension  that  is  forever  pre¬ 
sent,  and  giving  it  added  representational  status. 

As  we  shall  see,  the  neural  plausibility  of  our  system 
also  results  from  its  ability  to  operate  without  a  central 
controller.  Once  a  query  is  posed  to  the  system  by 
activating  appropriate  nodes,  it  computes  the  solution 
without  an  external  controller  directing  the  activity  of 
nodes  at  each  step  of  processing  (see  also  sect.  9. 1). 

Several  other  connectionist  solutions  to  the  binding 
problem  have  been  suggested  (Barnden  &  Srinivas  1991; 
Dolan  &  Smolensky  1989;  Feldman  1982;  Lange  &  Dyer 
1989;  Touretzky  &  Hinton  1988).  These  alternatives  are 
discussed  in  section  9.3. 
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3.1.  Raprasant/ng  dynamic  Nndlnga 

Refer  to  the  representation  of  some  predicates  and  enti¬ 
ties  shoM'n  in  Figure  2.  Observe  th^i  predicates,  their 
arguments,  and  entities  are  represented  by  using  distinct 
nodes.  For  example,  the  ternary  predicate  give  is  repre¬ 
sented  by  the  three  argument  nodes  labeled  giver,  recip, 
and  g-obj  together  with  an  associated  “node”  depicted  as  a 
dotted  rectangle  (the  role  of  the  latter  is  specified  in  sect. 
3.3).  For  simplicity  we  assume  that  each  argument  node 
corresponds  to  an  individual  connectionist  node;  this  is  an 
idealization.  In  section  7.3  we  discuss  how  each  argument 
node  corresponds  to  an  ensemble  of  nodes.  Nodes  such  as 
John  and  Mary  correspond  to  focal  nodes  of  more  elabo¬ 
rate  connectionist  representations  of  the  entities  “John” 
and  “Mary.”  Information  about  the  attribute  values  (fea¬ 
tures)  of  “John"  and  its  relationship  to  other  concepts  is 
encoded  by  linking  the  focal  node  John  to  appropriate 
nodes.  (Details  of  such  an  encoding  may  be  found  in 
Shastri  1991;  Shastri  &  Feldman  1986).  As  explained  by 
Feldman  (1989),  a  focal  node  may  also  be  realized  by  a 
small  ensemble  of  nodes. 

Dynamic  bindings  are  represented  in  the  system  by  the 
synchronous  firing  of  appropriate  nodes.  Specifically,  a 
dynamic  binding  between  a  predicate  argument  and  its 
filler  is  represented  by  the  synchronous  firing  of  nodes 
that  represent  the  argument  and  the  filler.  With  refer¬ 
ence  to  the  nodes  in  Figure  2,  the  dynamic  bindings 
(giver  ~  John,  recipient  =  Mary,  give-object  =  BookVjaxe 
represented  by  the  rhythmic  pattern  of  activity  shown  in 
Figure  3.  These  bindings  encode  the  dynamic  fact 
give(John,  Mary,  Bookl).  The  absolute  phase  of  firing  of 
filler  and  argument  nodes  is  not  significant  -  what  matters 
is  the  coincidence  (or  the  lack  thereof)  in  the  firing  of 
nodes.  The  activity  of  the  dotted  rectangular  nodes  is  not 


Figure  3.  Rhythmic  pattern  of  activation  representing  the 
dynamic  bindings  (giver  =  John,  recipient  =  Mary,  give-object 
=  Bookl).  These  bindings  constitute  the  fact  give(John,  Mary, 
Bookl).  The  binding  between  an  argument  and  a  filler  is  repre¬ 
sented  by  the  in-phase  firing  of  associated  nodes. 


significant  at  this  point  and  is  not  specified.  As  another 
example,  consider  the  firing  pattern  shown  in  Figure  4. 
This  pattern  of  activation  represents  the  single  binding 
(giver  =  John)  and  corresponds  to  the  partially  instanti¬ 
ated  fact  give(John,x,y),  (i.e.,  “John  gave  someone 
something”). 

Figure  5  shows  the  firing  pattern  of  nodes  correspond¬ 
ing  to  the  dynamic  representation  of  the  bindings  (giver  = 
John,  recipient  =  Mary,  give-object  =  Bookl,  owner  = 
Mary,  own-object  =  Bookl,  potential-seller  =  Mary,  can- 
sell-object  -  Bookl).  These  bindings  encode  the  facts 
give(John,  Mary,  Bookl),  own(Mary,  Bookl),  and  can- 
sell(Mary,  Bookl).  Observe  that  the  (multiple)  bindings 
between  Mary  and  the  arguments  recipient,  owner,  and 
potential-seller  are  represented  by  these  argument  nodes 
firing  in-phase  with  Mary.  Further,  the  individual  con¬ 
cepts  Mary,  Bookl,  and  John  are  firing  out  of  phase  and 
occupy  distinct  phases  in  the  rhythmic  pattern  of  activity. 


John 

Q 


Sunn  May 

2  2 


can-$ell  I,  T  1  O  O  '' 


Bookl  Ml 

2  2 


Figure  2.  EiiciKling  predicates  and  individual  concepts:  Dis¬ 
tinct  predicates  and  arguments  are  cneixlcd  using  distinct  nodes 
(in  .sect.  7.4  we  discuss  bow  nodes  may  be  replaced  by  an 
enseniljle  of  iukIcs).  The  batched  lines  below  concept  males  are 
intended  to  bigbligbl  that  these  iukIcs  are  just  fia-al  mnles  of  a 
much  richer  rt-presentalion  of  concepts. 


Figure  4.  Hopresentation  of  tlie  dynamic  binding  (giver  = 
John)  that  constitutes  the  partialK'  instantiatcrl  fact  "John  gast' 
sonieoue  something 


424 


BEHAVIORAL  AND  BRAIN  SCIENCES  (1993)  16  3 


Shastri  &  Ajjanagadde:  Association  to  reasoning 


p-MMf 

o-om 


g-otol 

giv«r 

John 

Susan 

Bookl 

Baai 


n 

n 

_ji 

n. 

__n- 

n 

JL_ 

n 

n 

n 

n 

_n 

n 

_JL 

n 

n_ 

n 

n 

JL_ 

n 

n 

_JL 

n  . 

n 

_ri. 

n. 

JL_ 

n 

n 

ji_ 

n 

n  n 

n  n 

n 

n  n 

n  n 

n 

n _ 

n 

n _ 

n _ 

n 

r 

_ TL 

n 

_ n 

n 

n 

Figure  5.  Pattern  of  activation  representing  the  dynamic 
bindings (gioer  =  John,  recipient  =  Mary,  give-object  =  Book  I, 
owner  =  Mary,  own-object  =  Bookl,  potential-seller  =  Mary, 
can-sell -object  =  Bookl).  These  bindings  constitute  the  facts 
^wejjohn,  Mary,  Bookl),  own(Mary,  Bookl),  and  can-seU(Mary, 
Bookl ).  The  transient  representation  of  an  entity  is  simply  a 
phase  within  an  oscillatory  pattern  of  activity.  The  number  of 
distinct  phases  required  to  represent  a  set  of  dynamic  bindings 
equals  only  the  number  of  distinct  entities  involved  in  the 
bindings.  In  this  example  three  distinct  phases  are  required. 
The  bindings  between  Mary  and  the  arguments  recipient, 
owner,  and  potential-seller  are  represented  by  the  in-phase 
firing  of  the  appropriate  argument  nodes  with  Mary. 


This  highlights  significant  aspects  of  the  proposed  solu¬ 
tion; 

1.  The  transient  or  short-term  representation  of  an 
entity  is  simply  a  phase  within  a  rhythmic  pattern  of 
activity. 

2.  TTie  number  of  distinct  phases  within  the  rhythmic 
activation  pattern  only  equals  the  number  of  distinct 
entities  participating  in  the  dynamic  bindings;  this  does 
not  depend  on  the  total  number  of  dynamic  bindings 
represented  by  the  activation  pattern. 

3.  The  number  of  distinct  entities  that  can  participate 
in  dynamic  bindings  at  the  same  time  is  limited  by  the 
ratio  of  the  period  of  the  rhythmic  activity  and  the  width 
of  individual  spikes. 

Thus  far  we  have  assumed  that  nodes  firing  in  syn¬ 
chrony  fire  precisely  in-phase.  This  is  an  idealization.  In 
general  we  would  assume  a  coarser  form  of  synchrony, 
where  nodes  firing  with  a  lag  or  lead  of  less  than  w/2  of  one 
another  would  be  considered  to  be  firing  in  synchrony. 
This  corresponds  to  treating  the  width  of  the  “window  of 
synchrony”  to  be  w. 

3.2.  Encoding  rules  and  propagating  dynamic  bindings 

In  section  2.2  we  described  h<Av  a  step  of  inference  or  rule 
application  may  be  viewed  as  taking  an  instance  of  the 
antecedent  predicate  and  dynamically  creating  an  in¬ 
stance  of  the  con.se(iuent  predicate,  with  the  argument 
bindings  of  the  latter  being  determined  by  (1)  the  argu¬ 
ment  bindings  of  the  former,  and  (2)  the  argument  corre- 
spoiulenci-  specified  1>\'  the  rule.  Consecpiently,  the  en- 
ccxling  of  a  rule  should  provide  a  means  for  propagating 
bindings  from  the  arguments  of  the  antecedent  predicate 


to  the  arguments  of  the  consequent  predicate  in  accor¬ 
dance  with  the  argument  correspondence  specified  in  the 
rule.  With  reference  to  Figure  2,  encoding  the  rules 
[giv€(x,y,z)  own(y,z)]  and  Vm,o  lown{u,v)  can- 
selllu.v)]  should  have  the  following  effect:  The  state  of 
activation  described  by  the  rhythmic  activation  pattern 
shown  in  Figure  3  should  eventually  lead  to  the  rhythmic 
activation  pattern  shown  in  Figure  5. 

The  desiicd  behavior  may  be  realized  if  a  rule  is 
encoded  by  linking  the  arguments  of  the  antecedent  and 
eonscMjuent  predicates  so  as  to  reflect  the  correspondence 
between  arguments  specified  by  the  rule.  For  example, 
the  rule  Vi.y.z  [give{x,y,z)  ^  own{y,z)]  can  be  encoded  by 
establishing  links  between  the  arguments  recipient  and 
give-object  of  give  and  the  arguments  owner  and  own- 
object  of  otvu,  respectively.  If  we  also  wish  to  encode  the 
rule  Vi.i/  [buy(x,y)  ^  otenfr,!/)],  we  can  do  so  by  connec¬ 
ting  the  arguments  buyer  and  buy-object  of  buy  to  the 
arguments  owner  and  own-object  of  own,  respectively. 
This  encoding  is  illustrated  in  Figure  6.  In  the  idealized 
model  we  are  assuming  that  each  argument  is  repre¬ 
sented  as  a  single  node  and  each  argument  correspon¬ 
dence  is  encoded  by  a  one-to-one  connection  between  the 
api’ropriate  argument  nodes.  As  discussed  in  section  7.3, 
however,  each  argument  will  be  encoded  as  an  ensemble 
of  nodes  and  each  argument  correspondence  will  be 
encoded  by  many-to-many  connections  between  the  ap¬ 
propriate  ensembles  (for  a  preview  see  Fig.  26). 

Arguments  and  concepts  are  encoded  by  using  what 
we  call  p-btu  nodes  (where  btu  refers  to  “binary  thresh¬ 
old  unit”).  These  nodes  have  the  following  idealized 
behavior: 

1.  If  a  node  A  is  connected  to  node  B  then  the  activity 
of  node  B  will  synchronize  with  the  activity  of  node  A.  In 
particular,  a  periodic  firing  of  A  will  lead  to  a  periodic  and 
in-phase  firing  of  B.  We  assume  that  p-btu  nodes  can 
respond  in  this  manner  as  long  as  the  period  of  firing,  it. 


Figure  6.  Kncxidiiig  of  predicates,  indix  idual  concepts,  and 
the  rules  /gire(r,  y.  z)  owu(ii.z}j.  V.v.iy  loivn(x,  y)  zp 

ran-srll(.x.  y)l.  umiVx.y  j  biiy(x.  y)  zp  otv>i(x.  y)l.  Links  between 
arguments  reflect  the  correspondence  lietween  arguments  in 
the  anteci'ih'nts  and  conseipients  ol  rules. 
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lies  in  the  interval  This  interval  can  be 

interpreted  as  defining  the  frequency  range  over  whid» 
p-btu  nodes  can  sustain  a  synchronized  response. 

2.  To  simplify  the  description  of  our  model  we  will 
assume  that  periodic  activity  in  a  node  can  lead  to  syn¬ 
chronous  periodic  activity  in  a  connected  node  within  one 
period. 

3.  A  threshold,  n,  associated  with  a  node  indicates  that 
the  node  will  fire  only  if  it  receives  n  or  more  synchronous 
inputs.  *5  If  unspecified,  a  node’s  threshold  is  assumed  to 
be  one.*® 

As  described  above,  interconnected  p-btu  nodes  can 
propagate  synchronous  activity  and  form  chains  of  nodes 
firing  in  synchrony.  In  section  7  we  point  to  evidence  from 
neurophysiology  and  cite  work  on  neural  modeling  that 
suggests  that  the  propagation  of  synchronous  activity  is 
neurally  plausible.  Given  the  above  interconnection  pat¬ 
tern  and  node  behavior,  the  initial  state  of  activation 
shown  in  Figure  7  will  lead  to  the  state  of  activation  shown 
in  Figure  8  after  one  period,  and  to  the  state  of  activation 
shown  in  Figure  9  after  another  period. 


Figure  7.  Initial  pattern  of  activation  representing  the  bind¬ 
ings  (giuer  =  John,  recipient  =  Mary,  give-object  =  Bookl). 


Figure  8.  Pattern  of  activation  after  one  period  of  oscillation 
(with  reference  to  the  state  of  activation  in  Figure  7).  This  state 
represents  the  dynamic  bindings:  (giver  =  John,  recipient  — 
Mary,  give-object  =  Bookl,  owner  ~  Mary,  own-object  = 
Book!).  Tlic  system  has  essentially  inferred  the  fact  own(Mary, 
Bock  I). 
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Figure  9.  Pattern  of  activation  after  two  periods  of  oscillation 
(with  reference  to  the  state  of  activation  in  Fig.  7).  This  state 
represents  the  dynamic  bindings:  (giver  =  John,  recipient  = 
Mary,  give-object  =  Bookl,  owner  =  Mary,  own-object  = 
Bookl,  potential-seller  =  Mary,  can-sell-object  =  Bookl).  The 
system  has  essentially  inferred  the  facts  own(Mary,  Bookl)  and 
can-seU(Mary,  Bookl). 


The  encoding  of  rules  by  the  explicit  encoding  of  the 
inferential  dependency  between  predicates  and  predi¬ 
cate  arguments,  in  conjunction  with  the  use  of  temporal 
synchrony,  provides  an  efficient  mechanism  for  propagat¬ 
ing  dynamic  bindings  and  performing  systematic  reason¬ 
ing.  Conceptually,  the  proposed  encodingjaf  rules  creates 
a  directed  inferential  dependency  graph:  Each  predicate 
argument  is  represented  by  a  node  in  the  graph,  and  each 
rule  is  represented  by  links  between  nodes  denoting  the 
arguments  of  the  antecedent  and  consequent  predicates. 
In  terms  of  this  conceptualization,  the  evolution  of  the 
system’s  state  of  activity  corresponds  to  a  parallel  breadth- 
first  traversal  of  the  directed  inferential  dependency 
graph.  'This  means  that  (I)  a  large  number  of  rules  can  fire 
in  parallel,  and  (2)  the  time  taken  to  generate  a  chain  of 
inference  is  independent  of  the  total  number  of  rules  and 
just  equals  Irt  where  I  is  the  length  of  the  chain  of 
inference  and  tt  is  the  period  of  oscillatory  activity. 

3.3.  Encoding  long-term  facts:  Memory  as  a  temporal 

pattern  matcher 

As  stated  in  section  2.3,  our  system  must  also  be  capable 
of  representing  long-term  facts,  which  are  essentially  a 
permanent  record  of  a  set  of  bindings  describing  a  partic¬ 
ular  situation.  The  representation  of  a  long-term  fact 
should  encode  the  bindings  pertaining  to  the  fact  in  a 
manner  that  allows  the  system  to  recognize  rapidly  dy¬ 
namic  bindings  that  match  the  encoded  fact.  Given  that 
dynamic  bindings  are  represented  as  temporal  patterns, 
it  follows  that  the  encoding  of  a  long-term  fact  should 
behave  like  a  temporal  pattern  matcher  that  becomes 
active  whenever  the  static  bindings  it  encodes  match  the 
dynamic  bindings  represented  in  the  system’s  state  of 
activation. 

The  design  of  such  a  temporal  pattern  matcher  is 
illustrated  in  Figures  10  and  11,  which  depict  the  encod¬ 
ing  of  the  long-term  facts  give(John,  Mary,  Bookl)  and 
give(John.  Susan,  x),  respectively  (the  latter  means  “John 


(ram  John 
fremMary 
(rofii  Bookl 


Figure  10.  Encoding  of  a  long-term  fact;  The  interconnections 
shown  here  encode  the  static  bindings  (gfoer-John,  recipient  = 
Mary,  gftx-olyect  =  Boohl)  that  constitute  the  long-term  fact 
givefjohn,  Mary,  Bookl ).  The  pentagon-shaped  nodes  are  T-and 
nodes.  A  T-and  node  becomes  active  if  it  receives  an  uninter¬ 
rupted  pulse  train.  The  activation  ofe:give  represents  an  exter¬ 
nally  or  internally  generated  query  asking  whether  the  dynamic 
bindings  indicated  by  the  pattern  of  activity  of  argument  nodes 
match  the  long-term  knowledge  encoded  in  the  LTKB.  The 
activation  of  c:give  represents  an  assertion  by  the  system  that 
these  dynamic  bindings  match  the  knowledge  encoded  in  the 
LTKB. 


fram  John 
from  Mary 


Figure  11.  Encoding  of  the  partially  instantiated  long-term 
feet  gitef/ohn,  Mary,  x),  that  is  “John  gave  Mary  something.” 
The  input  from  g-obj  does  not  receive  an  inhibitory  input  from 
any  filler. 


gave  Susan  something”).  The  encoding  fully  specifies  how 
a  predicate  is  encoded.  Observe  that  in  addition  to  p-btu 
nodes,  the  encoding  also  makes  use  of  pentagon  shaped 
T-and  nodes  that  have  the  following  idealized  behavior: 

1.  A  T-and  node  becomes  active  on  receiving  an  unin¬ 
terrupted  pulse  train,  that  is,  a  pulse  train  such  that  the 
gap  between  adjacent  pulses  is  less  than  a  spike  width. 
Thus  a  T-and  node  behaves  like  a  temporal  and  node.  On 
becoming  active,  such  -  node  produces  an  output  pulse 
train  similar  to  the  input  pulse  train. 

2.  Note  that  a  T-and  node  driven  by  a  periodic  input 
consisting  of  a  train  of  pulses  of  width  comparable  to  the 
period  tt,  will  produce  a  periodic  train  of  pulses  of  width 
and  periodicity  tt.  We  assume  that  a  T-and  node  can 
behave  in  this  manner  as  long  as  the  period  of  the  input 
pulse  train  lies  in  the  interval  [TTi.iin,  tt,,,,,,]. 
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3.  A  threshold,  n,  associated  with  a  T-and  node  indi¬ 
cates  that  the  node  will  fire  only  if  it  receives  n  or  more 
synchronous  pulse  trains.  If  unspecified,  n  is  assumed  to 
be  one. 

An  n-ary  predicate  P  is  encoded  by  using  two  T-and 
nodes  and  n  p-btu  nodes.  One  of  'these  T-and  nodes  is 
referred  to  as  the  enabler  and  the  other  as  the  collector. 
An  enabler  will  be  referred  to  as  e:P  and  drawn  pointing 
upward  whereas  a  collector  will  be  referred  to  as  c.P  and 
drawn  pointing  downward.  With  reference  to  Figures  10 
and  11,  the  ternary  predicate  give  is  represented  by  the 
enabler  e:give,  the  collector  c:give,  and  the  three  argu¬ 
ment  nodes  -  giocr,  recip,  and  g-obj.  The  representa¬ 
tional  significance  of  the  enabler  and  collector  nodes  is  as 
follows.  The  enabler  e:P  of  a  predicate  P  has  to  be 
activated  whenever  the  system  is  queried  about  P.  Such  a 
query  may  be  posed  by  an  external  process  or  generated 
internally  by  the  system  itself  during  an  episode  of  rea¬ 
soning  (see  sect.  4.4).  On  the  other  hand,  the  system 
activates  the  collector  c.P  of  a  predicate  P  whenever  the 
dynamic  bindings  of  the  arguments  of  P  match  the  knowl¬ 
edge  encoded  in  the  LTKB. 

A  long-term  fact  is  encoded  using  a  T-and  node  which 
receives  an  input  from  the  enabler  node  of  the  associated 
predicate.  This  input  is  modified  by  inhibitory  links  from 
argument  nodes  of  the  associated  predicate.  If  an  argu¬ 
ment  is  bound  to  an  entity,  the  modifier  input  from  the 
argument  node  is  in  turn  modified  by  an  inhibitory  link 
from  the  appropriate  entity  node.  The  output  of  the  T-and 
node  encoding  a  long-term  fact  is  connected  to  the  collec¬ 
tor  of  the  associated  predicate.  We  will  refer  to  the  T-and 
node  associated  with  a  long-term  fact  as  a  fact  node.  Note 
that  there  is  only  one  enabler  node,  one  collector  node, 
and  one  set  of  argument  nodes  for  each  predicate.  These 
nodes  are  shared  by  all  the  long-term  facts  pertaining  to 
that  predicate. 

It  can  be  shown  that  a  fact  node  becomes  active  if  and  only  if 
the  static  bindings  it  encodes  match  the  dynamic  bindings 
represented  in  the  network’s  state  of  activation.  As  stated  above, 
e;P  becomes  active  whenever  any  query  involving  the  predicate 
P  is  represented  in  the  system.  Once  active,  e:P  outputs  an 
uninterrupted  pulse  train  that  propagates  to  various  fact  nodes 
attached  to  e.P.  Now  the  pulse  train  arriving  at  a  fact  node  will  be 
interrupted  by  an  active  argument  of  P,  unless  the  filler  of  this 
argument  specified  by  the  long-term  fact  is  firing  in  synchrony 
with  the  argument.  But  a  filler  and  an  argument  will  be  firing  in 
synchrony  if  and  only  if  they  are  bound  in  the  dynamic  bindings. 
Thus  a  fact  node  will  receive  an  uninterrupted  pulse  if  and  only  if 
the  dynamic  bindings  represented  in  the  system’s  state  of 
activation  are  such  that  either  an  argument  is  unbound,  or  if 
bound,  the  argument  filler  in  the  dynamic  binding  matches  the 
argument  filler  specified  in  the  long-term  fact.  The  reader  may 
wish  to  verify  that  the  encodings  given  in  Figures  10  and  11  will 
behave  as  expected. 

The  encoding  of  the  long-term  fact  give{John,  Mary, 
Bookl)  will  recognize  dynamic  bindings  that  represent 
dynamic  facts  such  as  giveijohn,  Mary,  Bookl),  give 
(John,  Mary,  x),  giv€(x,  Mary,  y),  and  give(x,y,z).  How¬ 
ever,  it  will  not  recognize  those  that  represent  give(Mary, 
John,  Bookl)  or  giveijohn,  Susan,  x).  Similarly,  the  en¬ 
coding  of  the  long-term  fact  giveijohn,  Susan,  x)  will 
recognize  dynamic  bindings  that  encode  giveijohn,  Su¬ 
san,  x),  give(x,  Sii.san,  y),  and  giveix,  y,  z),  but  not  give(Su- 
.san,  John,  x)  or  give(John,  Susan,  CarT). 
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3.4.  Djfntunh  bindings  and  temporal  synchrony 

Given  the  representation  of  dynamic  bindings  and  the 
encoding  of  rules  described  in  the  preceding  sections, 
one  may  view  (1)  reasoning  as  the  transient  but  systematic 
propagation  of  a  rhythmic  pattern  of  activation,  (2)  an 
object  in  the  dynamic  memory  as  a  phase  in  the  above 
rhythmic  activity,  (3)  bindings  as  the  in-phase  firing  of 
argument  and  filler  nodes,  (4)  rules  as  interconnection 
patterns  that  cause  the  propagation  and  transformation  of 
such  rhythmic  patterns  of  activation,  and  (5)  facts  as 
temporal  pattern  matchers.  During  an  episode  of  reason¬ 
ing,  all  the  arguments  bound  to  a  filler  become  active  in 
the  same  phase  as  the  filler,  thereby  creating  transient 
“temporal  frames”  of  knowledge  grouped  together  by 
temporal  synchrony.  This  can  be  contrasted  with  “static” 
frames  of  knowledge  where  knowledge  is  grouped  to¬ 
gether,  spatially,  using  hard-wired  links  and  nodes. 

The  system  can  represent  a  large  number  of  dynamic 
bindings  at  the  same  time,  provided  the  number  of 
distinct  entities  involved  in  these  bindings  does  not  ex¬ 
ceed  '  where  Ttmax  is  ihe  maximum  period  (or 

the  lowest  frequency)  at  which  p-btu  nodes  can  sustain 
synchronous  oscillations  and  <>>  is  the  width  of  the  window 
of  synchrony.  Recall  that  a  window  of  synchrony  of  u) 
implies  that  nodes  firing  with  a  lag  or  lead  of  less  than  <a/2 
of  one  another  are  considered  to  be  in  synchrony.  (We 
discuss  biologically  plausible  values  of  it  and  w  in  sect.  7.2 
and  the  psychological  implications  of  these  limits  in  sect. 
8.)  As  described  thus  far,  the  system  allows  the  simul¬ 
taneous  representation  of  a  large  number  of  dynamic  facts 
but  only  supports  the  representation  of  one  dynamic  fact 
per  predicate.  (In  sect.  6  we  discuss  a  generalization  of  the 
proposed  representation  that  allows  multiple  dynamic 
facts  pertaining  to  each  predicate  to  be  active  simul¬ 
taneously.) 

Although  synchronous  activity  is  central  to  the  repre¬ 
sentation  and  propagation  of  binding,  the  system  does  not 
require  a  global  clock  or  a  central  controller.  The  propaga¬ 
tion  of  in-phase  activity  occurs  automatically  -  once  the 
system’s  state  of  activation  is  initialized  to  represent  an 
input  situation  by  setting  up  appropriate  dynamic  bind¬ 
ings,  the  system  state  evolves  automatically  to  represent 
the  dynamic  bindings  corresponding  to  situations  that 
follow  from  the  input  situation. 

Reasoning  is  the  spontaneous  outcome  of  the  system’s 
behavior.  The  system  does  not  encode  syntactic  rules  of 
inference  such  as  modus  ponens.  There  is  no  separate 
interpreter  or  inference  mechanism  in  the  system  that 
manipulates  and  rewrites  symbols.  The  encoding  of  the 
LTKB  is  best  viewed  as  a  vivid  internal  model  of  the 
agent’s  environment.  When  the  nodes  in  this  model  are 
activated  to  reflect  a  particular  situation  in  the  environ¬ 
ment,  the  model  simulates  the  behavior  of  the  external 
world  and  dynamically  creates  a  vivid  model  of  the  state  of 
affairs  resulting  from  the  given  situation.  The  system  is 
clearly  not  a  rule  following  system.  At  the  same  time  it  is 
not  rule  descril)ed  or  rule  governed  in  the  sense  that  a 
falling  apple  is.  As  Hatfield  (1991)  argues,  the  sy.stem  i.s 
best  described  as  being  rule  instantiating. 

3.5.  From  mechanisms  to  systems 

Tlu-  niecbanisins  proposed  in  the  previous  .sections  pro¬ 
vide  tile  building  blocks  for  a  eonneetionist  system  that 


can  represent  and  reason  with  knowledge  involving  n-ary 
predic'ates  and  variables.  These  mechanisms  may  interact 
in  different  ways  to  realize  different  sorts  of  reasoning 
behavior.  For  example,  they  can  lead  to  a  forward¬ 
reasoning  system  that  can  perform  predictive  inferences. 
Our  discussion  in  the  previous  sections  was  in  the  context 
of  such  a  system. 

The  proposed  mechanisms  may  also  be  used  to  create  a 
backward-reasoning  system  that  behaves  as  follows:  If  the 
system’s  state  of  activation  is  initialized  to  represent  a 
query,  it  attempts  to  answer  the  query  based  on  the 
knowledge  encoded  in  its  LTKB.  A  backward-reasoning 
system  may  be  generalized  to  perfonn  explanatory  infer¬ 
ences.  If  the  state  of  such  a  system  is  initialized  to 
represent  an  input  “situation,”  it  will  automatically  at¬ 
tempt  to  explain  this  situation  on  the  basis  of  knowledge 
in  its  LTKB  and  a  “minimal”  set  of  assumptions. 

With  the  aid  of  additional  mechanisms  it  is  possible 
to  design  a  system  that  performs  both  predictive  and  ex¬ 
planatory  inferences.  Such  a  system  would  make  predic¬ 
tions  based  on  incoming  information  and  at  the  same  time 
seek  explanations  for,  and  test  the  consistency  of,  this 
information. 


4.  A  backward-reasoning  system 

This  section  describes  a  backward-reasoning  system 
based  on  the  representational  mechanisms  described  in 
section  3.  The  system  encodes  facts  and  rules  in  its  LTKB 
and  answers  queries  on  the  basis  of  this  knowledge.  For 
example,  if  the  system  encodes  rules  Vi,y,z  [give(x,y,z) 
own{y,z)]  and  'iu,c  [ou)n(M,c)  =>  can-sell{u,c)],  and  the 
long-term  fact  “John  bought  Porsche?,”  it  will  respond  yes 
to  queries  such  as.  Does  John  own  Porsche??  or  Can  John 
sell  something?  The  time  taken  to  respond  yes  to  a  query 
is  only  proportional  to  the  length  of  the  shortest  deriva¬ 
tion  of  the  query  and  is  independent  of  the  size  of  the 
LTKB. 

In  subsequent  sections  we  describe  several  extensions 
of  the  backward-reasoning  system.  In  section  5  we  show 
how  the  system  may  be  combined  with  an  IS-A  hierarchy 
that  encodes  entities,  types  (categories),  and  the 
super-Zsubconcept  relations  between  them.  The  aug¬ 
mented  system  allows  the  occurrence  of  types,  non¬ 
specific  instances  of  types,  as  well  as  entities  in  rules, 
facts,  and  queries.  This  in  turn  makes  it  easier  to  encode 
the  appropriateness  aspect  of  rules.  An  extension  of  the 
system  to  perform  abduction  is  described  in  Ajjanagadde 
(1991). 


4.1.  The  backward'reasoning  system  -  a  functional 
specification 

The  reasoning  system  can  encode  rules  of  the  form;'” 
Vx, . .x,„  )  A  I\(. AP„(...)i»  3r,,, 

The  arguments  of  P,s  are  elements  of  {.v,.  .Vo,  .  .  .  v,,,}.  An 
argument  of  is  either  an  element  of  {.Y|.  .Vo.  .  .  .  .v,„}.or 
an  element  of  Zo.  .  .  .  Z/}.  or  a  constant.  It  is  recinired 
that  an\'  variable  oeenrring  in  multiple  argument  posi¬ 
tions  in  the  antecedent  o(  a  ruh'  must  also  app<'ar  in  the 
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consequent.  The  significance  of  this  constraint  is  dis¬ 
cussed  in  section  4.9.  Additional  examples  of  rules  are: 

yfx.yj  loinnipresent(x)  ^  pr8sen((x,^t)] 

Anyone  who  is  omnipresent  is  present  everywhere  at 
all  times; 

Vx.y  [bon^x.y)  ^  3«  preaent(x,y,t)] 

Everyone  must  have  been  present  at  his  or  her  birth¬ 
place  sometime. 

Vx  [(rMingle(x)  ^  number-of-sides(x,  3)] 

Vx,y  [sibling(x,y)  A  bom-together(x,y)  ^  tunns{x,y)] 

Facts  are  assumed  to  be  partial  or  complete  instantia¬ 
tions  of  predicates.  In  other  words,  fiicts  are  atomic 
formulae  of  the  form  P(t,,t2.  ■•<*).  where  tfi  are  either 
constants  or  distinct  existentially  quantified  variables. 
Some  examples  of  focts  are: 
giveijohn,  Mary,  Bookl);  John  gave  Mary  Bookl. 
gioe(x,  Susan,  BaU2);  Someone  gave  Susan  Ball2. 

buyfjohn  x);  John  bought  something. 

own(Mary,  Balll);  Mary  owns  Balll. 

omnipresent(x)-.  There  exists  someone  who 

is  omnipresent. 
triangle(A3);  A3  is  a  triangle. 

siblingiSusan,  Mary);  Susan  and  Mary  are  sib¬ 

lings. 

bom-together{Susan,  Susan  and  Mary  were 

Mary);  bom  at  the  same  time. 

A  query  has  the  same  form  as  a  fact.  A  query,  all  of 
whose  arguments  are  bound  to  constants,  corresponds  to 
the  yes-no  question,  “Does  the  query  follow  from  rules 
and  facts  encoded  in  the  long-term  memory  of  the  sys¬ 
tem?”  A  query  with  existentially  quantified  variables, 
however,  has  several  interpretations.  For  example,  the 
query  P(a,x),  where  a  is  a  constant  and  x  is  an  existentially 
quantified  argument,  may  be  viewed  as  the  yes-no  query: 
“Does  P(a,x)  follow  from  the  rules  and  facts  for  some  value 
of  x?”  Alternatively  this  query  may  be  viewed  as  the  wh- 
query:  “For  what  values  of  x  does  P(a,x)  follow  from  the 
rules  and  facts  in  the  system  s  long-term  memory?”  Table 
1  lists  some  queries,  their  interpretation(s),  and  their 
answer(s). 

In  describing  the  backward  reasoner  we  begin  by 
making  several  simplifying  assumptions.  We  assume  that 
rules  have  a  single  predicate  in  the  antecedent  and  that 
constants  and  existentially  quantified  variables  do  not 
appear  in  the  consequents  of  rules.  We  also  restrict 
ourselves  to  yes-no  queries  at  first.  Subsequent  sections 
will  provide  the  relevant  details. 


4,2.  Encoding  rules  and  facts  In  long-term  memory 

Figure  12  depicts  the  encoding  of  the  rules  Vx,y,2 
[give(x,y,z)  ^  own(y,z)];  Vx,y  [buy(x,y)  own(x,y)];  and 
Vx,y  lown(x,y)  2^  cati-seU(x,y)],  and  the  facts  giveijohn, 
Mary,  Bookl),  buyjohn,  x),  and  ownlMary,  Balll). 

As  stated  in  section  3,  a  constant  (i.e.,  an  entity)  is 
represented  by  a  p-btii  node,  and  an  n-ary  predicate  is 
represented  by  a  pair  of  T-and  nodes  and  n  p-btu  nodes. 
One  of  the  T-and  nodes  is  referred  to  as  the  enabler  and 
the  other  as  the  collector.  An  enabler  is  drawn  pointing 
upward  and  is  iiained  e:[))re<licale-nanu;].  A  collector  is 
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Table  1.  Interpretation  of  tome  queries  and  their  answers 


Query 

yes-no  form 
(answer) 

uA-form 

(answer) 

own(Mary,BaUi) 

Does  Mary  own 
Balll? 

(yes) 

— 

can-sell(Mary, 

Bookl) 

Can  Mary  sell 
Bookl? 

(yes) 

— 

can-selli  Mary,x) 

Can  Mary  sell 
something? 

(yes) 

What  can  Mary 
sell? 

(Bookl,  Balll) 

own(x,y) 

Does  someone 
own  something? 
(yes) 

Who  owns 
something? 

(Susan,  Mary, 

John) 

What  is  owned  by 
someone? 

(Bookl,  Balll, 
Ball2) 

can-seU(]ohn,x) 

Can  John  sell 
something? 

(yes) 

What  can  John 
sell? 

(something,  but 
don’t  know  what) 

present(x,North- 

pole,llll89) 

Was  someone  pre¬ 
sent  at  northpole 
on  1/1/89? 

(yes) 

Who  was  present 
at  northpole  on 
1/1/89? 

(There  was  some¬ 
one,  but  don’t 
know  who) 

number-of- 

sides(A3,4) 

Does  A3  have  4 
sides? 

(no) 

can-sell(Mary, 

BaU2) 

Can  Mary  sell 
Ball2? 

(no) 

— 

twtns(Susan, 

Mary) 

Are  Mary  and  Su¬ 
san  twins? 

(yes) 

““ 

drawn  pointing  downwards  and  is  named  c;[predicate- 
namej.  The  enabler,  e:P,  of  a  predicate  P  has  to  be 
activated  whenever  the  system  is  queried  about  P.  As  we 
shall  see,  such  a  query  may  be  posed  by  an  external 
process  or  generated  internally  by  the  system  during  an 
episode  of  reasoning.  On  the  other  hand,  the  system 
activates  the  collector,  c:P,  of  a  predicate  P  whenever  the 
current  dynamic  bindings  of  the  arguments  of  P  match  the 
long-term  knowledge  encoded  in  the  system.  Each  fact  is 
encoded  using  a  distinct  T-and  node  that  is  intercon¬ 
nected  with  appropriate  enabler,  c-ollector,  argument, 
and  entity  nodes  (see  sect.  3.3). 

A  rule  is  encoded  by  connecting  (1)  the  collector  of  the 
antecedent  predicate  to  the  collector  of  the  c-onsecpient 
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Figure  12.  A  network  encoding  the  rules  Vx,!/,2  [give(x,  y,  z) 
^own(y,  z)J,  yfx.y  [buy(x,  y)^own(x,  y)},  a.nd\/x,y  [own(x,  y) 
^  can-seU(x,  y)];  and  the  long-term  facts  give(John,  Mary, 
Book!),  buy(John,  x),  and  own(Mary,  Book2).  The  links  be¬ 
tween  arguments  are  in  the  reverse  direction  because  the  rules 
are  wired  for  “backward  reasoning.” 


predicate,  (2)  the  enabler  of  the  consequent  predicate  to 
the  enabler  of  the  antecedent  predicate,  and  (3)  the 
argument  nodes  of  the  consequent  predicate  to  the  argu¬ 
ment  nodes  of  the  antecedent  predicate  in  accordance 
with  the  correspondence  between  these  arguments  spe¬ 
cified  in  the  rule  (see  Fig.  12).  Notice  that  the  links  are 
directed  from  the  arguments  of  the  consequent  predicate 
to  the  arguments  of  the  antecedent  predicate.  The  direc¬ 
tion  of  links  is  reversed  because  the  system  performs 
backward  reasoning. 

4.3.  Posing  a  query:  Specifying  dynamic  bindings 

A  query  is  a  (partially)  specified  predicate  instance  of  the 
formP(fj,  .  .  .  ,  t„)?,  where  tjS  are  either  constants  (enti¬ 
ties)  or  existentially  quantified  variables.  Therefore,  pos¬ 
ing  a  query  to  the  system  involves  specifying  the  query 
predicate  and  the  argument  bindings  specified  in  the 
query.  We  will  assume  that  only  one  external  process 
communicates  with  the  reasoning  system.  The  possibility 
of  communication  among  several  modules  is  discussed  in 
section  10.4  (also  see  sects.  10.1-10.3).  Let  us  choose  an 
arbitrary  point  in  time  -  say,  -  as  our  point  of  reference 
for  initiating  the  query.  The  argument  bindings  specified 
in  the  query  are  communicated  to  the  network  as  follows: 

1.  Let  the  argument  bindings  involve  m  distinct  enti¬ 
ties:  C],  .  .  .  ,  c„,.  With  each  c,-,  associate  a  delay  8,-  such 
that  no  two  delays  are  within  w  of  one  another  and  the 
longest  delay  is  less  than  -rr  —  w.  Here  o)  is  the  width  of  the 
window  of  .synchrony,  and  rr  lies  in  the  interval 

2.  The  argument  bindings  of  an  entity  c,-  are  indicated 
to  the  system  by  providing  an  oscillators'  spike  train  «)f 
periodicity  ft  starting  at  +  8j,  to  c,  and  all  arguments  of 
the  (jnery  predicate  bound  to  q.  As  a  result,  a  distinct 
lilia.se  is  as.sociated  with  each  distinct  entity  iiitroduc(“d  in 


the  query  and  argument  bindings  are  represented  by  the 
synchronous  activation  of  the  appropriate  entity  and  argu¬ 
ment  nodes. 

3.  The  query  predicate  is  specified  by  activating  e.P, 
the  enabler  of  the  query  predicate  P,  with  a  pulse  train  of 
width  and  periodicity  it  starting  at  time  Iq. 

Observe  that  posing  a  query  simply  involves  activating 
the  enabler  node  of  the  query  predicate  and  the  argu¬ 
ments  and  fillers  specified  in  the  query.  There  is  no 
central  controller  that  monitors  and  regulates  the  behav¬ 
ior  of  individual  nodes  at  each  step  of  processing. 

4.4.  The  Inference  process  for  yes-no  queries 

Once  a  query  is  posed  to  the  system,  its  state  of  activation 
evolves  automatically  and  produces  an  answer  to  the 
query.  The  activation  of  the  collector  node  of  the  query 
predicate  indicates  that  the  answer  to  the  query  is  yes. 
The  time  taken  by  the  system  to  produce  a  yes  answer 
equals  2it((  +  1),  where  it  is  the  period  of  oscillation  of 
nodes  and  (equals  the  length  of  the  shortest  derivation  of 
the  query.  If  the  collector  node  of  the  query  predicate 
does  not  receive  any  activation  within  2(d  +  1)  periods  of 
oscillations,  where  d  equals  the  diame!  ^  the  inferen¬ 
tial  dependency  graph,  the  answer  to  ry  is  “don't 

know.”  If  we  make  the  closed-world  ass  then  a 

don’t-know  answer  can  be  viewed  as  a  ikj  answer. 

We  illustrate  the  inference  process  with  the  help  of  an 
example  (see  Fig.  12).  Consider  the  query  can-sell{Mary, 
Bookl)?  (i.e..  Can  Mary  sell  BookI?).  This  query  is  posed 
by  providing  inputs  to  the  entities  Mary  and  Bookl ,  the 
arguments  p-seller,  cs-obj,  and  the  enabler  e.can-sell,  as 
shown  in  Figure  13.  Observe  that  Mary  and  seller 


C:can-«M 


FI 

pobi 

r«cip 

•iMiy 

buyr 

•:Dwn 

ownpr 

Bookl 
cs-ob| 
Mary 
p-sollor 
input  to«;can-8«ll 
input  to  c»-obt 
input  to  p-Mlior 
input  to  Bookl 
input  to  Mary 


n_n_n_n_n_a_n. 
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Figure  1.3.  Activation  trace  for  the  (lucry  ciin-scll(M(inj. 
Bookl)?  ((am  Mar\’  sell  Bookl?).  The  qiu-ry  is  jioscd  hy  pros  id- 
ing  an  oscillatory  input  to  c:ran-scll.  Mary.  Bookl .  p-.-irllcr.  an<l 
(•■i-ohj  as  sliown.  TIk'  as'fisalion  of  c:ctin-\rll  indi<alcs  a  \es 
answer. 
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receive  synchronous  activation  and  so  do  Bookl  and  ca- 
obi.  Let  us  refer  to  the  phase  of  activity  of  Mary  and 
Bookl  as  phase-1  and  phase-2,  respectively. 

As  a  result  of  the  inputs,  Mary  and  p-aeller  fire  syn¬ 
chronously  in  phase- 1,  whereas  Bookl  and  ca-obj  fire 
synchronously  in  phase-2  of  every  period  of  oscillation, 
llie  node  e:can-seU  also  oscillates  and  generates  a  pulse 
train  of  periodicity  and  pulse  width  ir.  The  activations 
fipom  the  arguments  p-seUer  and  ca-obj  reach  the  argu¬ 
ments  owner  and  o-ohj  of  the  predicate  own,  and  conse¬ 
quently,  starting  with  the  second  period  of  oscillation, 
owner  and  o-obj  become  active  in  phase-1  and  phase-2, 
respectively.  Thereafter,  the  nodes  Mary,  owner,  and 
p-seller  are  active  in  phase-1,  whereas  the  nodes  Bookl, 
cs-obj,  and  o-obj  are  active  in  phase-2.  At  the  same  time, 
the  activation  from  exan-seU.  activates  e:own.  At  this  point 
the  system  has  essentially  created  two  dynamic  bindings 
-  owner  =  Mary  and  own-object  =  Bookl.  Given  that 
e:own  is  also  active,  the  system  s  state  of  activity  now 
encodes  the  internally  generated  query  own{Mary, 
Bookl)?  (i.e..  Does  Mary  own  Bookl?). 

The  fact  node  associated  with  the  fact  own(Mary,  Balll) 
does  not  match  the  query  and  remains  inactive.  Recall 
that  fact  nodes  are  T-and  nodes  and  hence  become  active 
only  upon  receiving  an  uninterrupted  pulse  train  (see 
sect.  3.3).  Since  Balll  is  not  firing,  the  inhibitory  activa¬ 
tion  from  the  argument  node  owner  interrupts  the  activa¬ 
tion  going  from  e:own  to  the  fact  node  and  prevents  it  from 
becoming  active. 

The  activation  from  owner  and  o-obj  reaches  the  argu¬ 
ments  recip  and  g-obj  of  give,  and  buyer  and  b-obj  of  buy, 
respectively.  Thus  beginning  with  the  third  period,  argu¬ 
ments  recip  and  buyer  become  active  in  phase-1 ,  whereas 
arguments  g-obj  and  b-obj  become  active  in  phase-2.  In 
essence,  the  system  has  created  new  bindings  for  the 
arguments  of  predicates  can-sell  and  buy.  Given  that  the 
nodes  e:buy  and  e:give  are  also  active,  the  system  s  state  of 
activity  now  encodes  two  additional  queries:  give{x,  Mary, 
Book!)?  and  buyiMary,  Bookl)?. 

The  fact  node  representing  the  fact  buyijohn,  x)  does 
not  become  active  because  the  activation  from  e:buy  is 
interrupted  by  the  inhibitory  activations  from  the  argu¬ 
ments  buyer  and  b-obj-  (Notice  that  John  is  not  active.) 
The  fact  node  FI,  associated  with  the  fact  giveijohn, 
Mary,  Bookl)  however,  does  become  active  as  a  result  of 
the  uninterrupted  activation  it  receives  from  e:give.  Ob¬ 
serve  that  the  argument  giver  is  not  firing  and  the  inhibi¬ 
tory  inputs  from  the  arguments  recip  and  g-obj  are 
blocked  by  the  synchronous  inputs  from  Mary  and  Bookl , 
respectively.  The  activation  from  FI  causes  c:give  to 
become  active,  and  the  output  from  c:give  in  turn  causes 
c-.own  to  become  active  and  transmit  an  output  to  cxan- 
sell.  Consequently  c.can-sell,  the  collector  of  the  query 
predicate  can-sell,  becomes  active,  resulting  in  an  affir¬ 
mative  answer  to  the  query. 

4.5.  Encoding  rules  with  constants  and  repeated  variables 

In  the  consequent 

In  tins  section  we  tlcscrilie  how  rules  containing  constants 
(entities)  and/or  <-xi.steiitially  quantified  variables  in  the  eon.se- 
<]uent  are  eTieod<-d.  Consider  the  rule: 

Vv;.v2.i/  |;’iv7..v2)  3:  C(v7 ..v2.(/.c.(;)|  (.}) 


The  encoding  of  rule  (3)  is  shown  in  Figure  14.  It  uses  a  new  type 
of  node,  which  we  refer  to  as  a  r-or  node  (node  gi  in  Fig.  14). 
Such  a  node  behaves  like  a  temporal  or  node  and  becomes  active 
on  receiving  any  input  above  its  threshold  and  generates  an 
oscillatory  response  with  a  period  and  pulse  width  equal  to 
maximum  period  at  which  the  p-btu  nodes  can  sustain 
synchronous  activity. 

Node  gl  projects  inhibitory  modifiers  to  links  between  argu¬ 
ment  and  enabler  nodes  that  can  block  the  firing  of  the  rule.  The 
node  gi  ensures  that  the  rule  participates  in  an  inference  only  if 
all  the  conditions  implicit  in  the  consequent  of  the  rule  are  met. 
The  first  condition  concerns  the  occurrence  of  existentially 
quantified  variables  in  the  consequent  of  a  rule.  Observe  that 
such  a  rule  only  warrants  the  inference  that  there  exist  some 
filler  of  an  existentially  quantified  argument  and,  hence,  cannot 
be  used  to  infer  that  a  specific  entity  fills  such  an  argument. 
Therefore,  if  an  existentially  quantified  variable  in  the  conse¬ 
quent  of  a  rule  gets  bound  in  the  reasoning  process,  the  rule 
cannot  be  used  to  infer  the  consequent.  With  reference  to  rule 
(3).  the  desired  behavior  is  achieved  by  the  link  from  the 
existentially  quantified  (fourth)  argument  of  Q  to  gi  and  the 
inhibitory  modifiers  emanating  from  gi.  The  node  gi  will 
become  active  and  block  the  firing  of  the  rule  whenever  the 
fourth  argument  of  Q  gets  bound  to  any  filler. 

The  second  condition  concerns  the  occurrence  of  entities  in 
the  consequent  of  a  rule.  Rule  (3)  cannot  be  used  if  its  fifth 
argument  is  bound  to  any  entity  other  than  a.  In  general,  a  rule 
that  has  an  entity  in  its  consequent  cannot  be  used  if  the 
corresponding  argument  gets  bound  to  any  other  entity  during 
the  reasoning  process.  In  the  encoding  of  rule  (3),  this  constraint 
is  encoded  by  link  from  the  fifth  argument  of  Q  to  gl  that  is  in 
turn  modified  by  an  inhibitory  modifier  from  a.  If  the  fifth 
argument  of  Q  gets  bound  to  any  entity  other  than  a,  gl  will 
become  active  and  block  the  firing  of  the  rule. 

If  the  same  variable  occurs  in  multiple  argument  positions  in 
the  consequent  of  a  rule,  it  means  that  this  variable  should 
either  remain  unbound  or  get  bound  to  the  same  entity.  This 
constraint  can  be  encoded  by  introducing  a  node  that  receives 


Figure  14.  Encoding  rules  with  existentially  quantified  vari¬ 
ables  and  constants  in  the  consequent;  Tlie  netsvork  encodes  the 
rule  V.vi  ,.v2,t/  lP(xl .  x2)  3:  Q(xl ,  x2,  y,  z.  a)).  This  rule  must 

not  fire  during  the  pnK-cssing  of  a  query,  if  either  the  existen¬ 
tially  Ixiund  argument  :  gets  hound,  or  the  last  argument  gets 
iMiiind  to  a  constant  other  than  a.  The  node  gl  is  a  T-or  mxle.  It 
projects  inhihilors  nuKlifiers  that  l)l<K-k  the  firing  of  the  rule  if 
the  above  condition  is  violat*’d. 
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inputs  fiiom  all  the  arguments  that  correspond  to  the  same 
variable,  and  becomes  active  and  inhibits  the  firing  of  the  rule 
unless  all  such  arguments  are  firing  in  synchrony.  Observe  that 
due  to  the  temporal  encoding,  arguments  bound  to  the  same 
entity  will  fire  in  the  same  phase  and,  hence,  a  node  need  only 
check  that  the  inputs  from  appropriate  argument  nodes  are  in 
syiichrony  to  determine  that  the  arguments  are  bound  to  the 
same  entity.  Consider  the  network  fragment,  shown  in  Figure 
15,  that  depicts  the  encoding  of  the  rule  Vx  P(x)  ^  Q{x,x,a).  The 
node  g2  is  like  a  r-or  node  except  that  it  becomes  active  if  it 
receives  inputs  in  more  than  one  phase  within  a  period  of 
oscillation.  This  behavior  ensures  that  the  firing  of  the  rule  is 
inhibited  unless  the  appropriate  arguments  are  bound  to  the 
same  entity. 

4.6.  Encoding  ntuWpIo  antocodant  rulM 

A  rule  with  conjunctive  predicates  in  the  antecedent,  that  is,  a 
rule  of  the  form  Pj(.  .  .)APj(.  .  .)A  .  .  .P„(.  .  .)^Q(.  .  .),  is 
encoded  using  an  additional  T-and  node  that  has  a  threshold  of 

m.  The  outputs  of  the  collectors  of  Pj . P„  are  connected  to 

this  node,  which  in  turn  is  connected  to  the  collector  of  Q.  This 
additional  node  becomes  active  if  and  only  if  it  receives  inputs 
fix>m  the  collector  nodes  of  all  the  m  antecedent  predicates.  The 
interconnections  between  the  argument  nodes  of  the  ante¬ 
cedent  and  consequent  predicates  remain  unchanged.  Figure 
16  illustrates  the  encoding  of  the  multiple  antecedent  rule 
^x,yP(x,y)  /^Q(y,x)  ^  li(x,y).  The  T-and  node  labeled  g3  has  a 
threshold  of  2. 


Figure  15.  Encoding  rules  where  the  same  variable  occurs  in 
multiple  argument  positions  in  the  consequent:  The  network 
encodes  the  rule  ^xP(x)^Q(  x,  x,  a ).  The  rule  must  fire  only  if  a 
multiply  occurring  variable  is  unbound,  or  all  occurrences  of  the 
variable  are  bound  to  the  same  constant.  The  node  g2  is  like  a 
T-or  node  except  that  it  bec-omes  active  if  it  receives  inputs  in 
more  than  one  phase  within  a  period  of  oscillation.  On  becoming 
active  it  activates  the  T-or  node  gl.  The  firing  of  gl  blocks  the 
firing  of  the  rule  whenever  the  first  and  second  arguments  of  Q 
get  Ixnind  to  different  constants.  (The  encoding  also  enforces 
the  constraint  that  the  last  argument  of  Q  should  not  be  Ix)und  to 
any  constant  other  than  ti.) 


Figure  16.  The  encoding  of  the  rule  Vx,yP(x,  y)  Q(y,  x)  ^ 
R(x,  y).  The  T-and  node  labeled  g3  has  a  threshold  of  2.  Multiple 
antecedent  rules  are  encoded  by  using  an  additional  T-and  node 
whose  threshold  equals  the  number  of  predicates  in  the  ante¬ 
cedent.  This  node  becomes  active  on  receiving  inputs  from  the 
collector  nodes  of  all  the  antecedent  predicates. 


4.7.  Answering  wh-queries 

As  stated  in  section  4. 1,  a  query  with  unbound  arguments 
can  be  interpreted  either  as  a  yes-no  query  or  a  ujfi-query. 
To  answer  a  yes-no  query  the  system  need  only  determine 
whether  there  exist  some  instantiations  of  the  unbound 
arguments.  To  answer  a  wh-query,  however,  the  system 
must  also  determine  the  instantiations  of  unbound  argu¬ 
ments  for  which  the  query  is  true.  We  describe  how  the 
proposed  system  can  be  extended  to  do  so. 

Consider  the  proof  of  the  query  can-sell(Mary,x)?  with 
respect  to  the  network  shown  in  Figure  12.  The  yes-no 
version  of  this  query  will  be  answered  in  the  affirmative 
and  the  two  relevant  facts  own(Mary,  Balll)  and 
give(John,  Mary,  Bookl)  will  become  active.  The  answer 
to  the  wh-query  What  can  Mary  sell?  simply  consists  of 
the  entities  bound  to  the  arguments  g-obj  and  b-obj, 
respectively,  of  the  two  active  facts.  Observe  that  the 
arguments  g-obj  and  b-obj  are  precisely  the  arguments 
that  map  to  the  unbound  argument  cs-obj  of  can-sell  via 
the  rules  encoded  in  the  system.  The  system  can  extract 
this  information  by  the  same  binding  propagation  mecha¬ 
nism  it  uses  to  map  bound  arguments.  A  straightforward 
way  of  doing  so  is  to  posit  an  answer  extraction  stage  that 
occurs  after  the  yes-no  query  associated  with  the  wh- 
query  has  produced  a  yes  answer.  For  example,  given  the 
query  What  can  Mary  sell?  the  system  first  computes  the 
answer  to  the  yes-no  query  Can  Mary  sell  something?  and 
activates  the  facts  that  lead  to  a  yes  answer,  namely, 
own(Mary,  Balll)  and  give(John,  Mary,  Bookl).  The 
answer  extraction  stage  follows  and  picks  out  the  entities 
Balll  and  Bookl  as  the  answers. 

In  order  to  support  answer  extraction,  the  representa¬ 
tion  of  a  fact  is  augmented  as  shown  in  Figure  17.  The 
representation  of  a  fact  involving  an  n-ary  predicate  is 
modified  to  include  n  +  1  additional  nodes;  For  each  of 
the  n  arguments  there  is  a  tvvo-input  p-btu  node  with  a 
threshold  of  2.  We  refer  to  such  a  node  as  a  binder  node. 
The  other  node  (shown  as  a  filled-in  p<iinted  i>entagon)  is 
like  a  T-and  mxle  except  that  once  activated,  it  remains  so 
even  after  the  inputs  are  withdrawn  (for  si-\eral  perifxis  of 
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Figure  17.  Augmented  representation  of  a  long-term  fact  in 
order  to  support  answer  extraction.  For  each  argument  of  the 
associated  predicate  there  exists  a  p-btu  node  with  a  threshold  of 
2.  The  node  shown  as  a  filled-in  pentagon  behaves  like  a  T-and 
node  except  that  once  activated,  it  stays  active  for  some  time  - 
say  about  20ir  -  even  after  the  inputs  are  withdrawn. 


oscillations).  This  node,  which  we  will  refer  to  as  a  latch 
node,  receives  an  answer  input  and  an  input  from  the 
associated  fact  node. 

At  the  end  of  the  first  stage,  the  fact  nodes  correspond¬ 
ing  to  all  the  relevant  facts  would  have  become  active.  The 
output  of  these  nodes  in  conjunction  with  the  answer 
signal  will  turn  on  the  associated  latch  nodes  and  provide 
one  of  the  two  inptrts  to  the  binder  nodes.  If  the  associated 
yes-no  query  results  in  a  yes  answer  (i.e. ,  the  collector  of 
the  query  predicate  becomes  active),  the  desired  un¬ 
bound  arguments  of  the  query  predicate  are  activated  in  a 
distinct  phase.  The  activation  of  these  arguments  even¬ 
tually  leads  to  the  activation  of  the  appropriate  arguments 
in  the  facts  relevant  to  answering  the  query.  This  provides 
an  input  to  the  appropriate  binder  nodes  of  these  facts. 
Because  the  binder  nodes  were  already  receiving  an  input 
from  a  latch  node,  they  become  active  and  produce  an 
output  that  activates  the  associated  entities  in  phase  with 
the  appropriate  query  arguments.  The  answer  to  the  wh- 
query  (i.e.,  the  entities  that  fill  the  argument  a,  of  the 
query)  will  be  precisely  those  entities  that  are  active  in 
phase  with  flj.  The  time  taken  by  the  answer  extraction 
step  is  bounded  by  the  depth  of  the  inferential  depen¬ 
dency  graph. 


4.8.  AdmlWng  function  terms 

The  expressiveness  and  reasoning  power  of  the  system 
can  be  extended  by  allowing  restricted  occurrences  of 
(unction  terms  in  rules  and  facts.  Function  terms  intro¬ 
duce  new  entities  during  the  reasoning  process.  But  given 
that  entities  are  represented  as  a  phase  in  the  pattern  of 
activity,  an  entity  introduced  by  a  function  term  can  be 
represented  by  an  additional  phase  in  the  rhythmic  activ¬ 
ity.  Thus  the  reference  to  "mother-of {Tom)”  during  an 
episode  of  reasoning  should  lead  to  activity  in  a  distinct 
phase.  This  phase  would  represent  the  “mother  of  Tom,” 
and  any  arguments  Ixuind  to  the  “mother  of  Tom"  would 
now  fire  in  this  phase.  A  provisional  solution  along  these 
lines  is  tleseribed  by  Ajjanagadde  (1990). 


4^.  Constraints  on  the  htrm  of  rules 

The  encoding  of  rules  described  thus  fax  enforces  (1)  the 
correspondence  between  the  arguments  aS  the  anteced¬ 
ent  and  consequent  predicates  in  a  rule,  and  (2)  equality 
among  arguments  in  the  consequent  of  a  rule.  In  certain 
cases,  however,  it  is  difiicult  for  the  backward-reasoning 
system  to  enforce  equality  among  arguments  in  the  ante¬ 
cedent  of  a  rule.  Consider  the  rule  Vx,y  P(x,x,y)  Q(y) 
and  the  query  Q(a)?.  The  processing  of  this  query  will 
result  in  the  dynamic  query  P(?,?,a)?  -  where  the  first 
and  second  arguments  will  be  left  unspecified.  Conse¬ 
quently,  the  system  cannot  enforce  the  condition  implicit 
in  the  rule  that  a  long-term  fact  involving  P  should  match 
the  query  Q(a)  only  if  its  first  and  second  arguments  are 
bound  to  the  same  constant.  Performing  such  an  equality 
test  is  complicated  in  a  system  that  allows  multiple  predi¬ 
cates  in  the  antecedent  of  rules  and  the  chaining  of 
inference.  Consider  the  rule  Vx,i/  P(x,y)  A  R(x,y)  Q( y) 
and  the  query  Q(a)?.  The  predicates  P  and  R  may  1^ 
derivable  fram  other  predicates  by  a  long  sequence  of  rule 
application.  Hence  to  derive  the  query  Q  (a)?  the  system 
may  have  to  test  the  equality  of  arbitrary  pairs  of  argu¬ 
ment  fillers  in  a  potentially  large  number  of  facts  distrib¬ 
uted  across  the  LTKB.  It  is  conjectured  that  nonlocal  and 
exhaustive  equality  testing  cannot  be  done  effectively  in 
any  model  that  uses  only  a  linear  number  of  nodes  in  the 
size  of  the  LTKB  and  time  that  is  independent  of  the  size 
of  the  LTKB. 

Contrast  the  situation  described  above  with  one 
wherein  the  rule  is  Vx,y  P(x,x,y)  =>  Q(x)  and  the  query  is 
Q(a)?.  The  dynamic  query  resulting  from  the  processing 
of  the  query  Q(a)F  will  be  P(a,a,y)?.  Notice  that  now  the 
condition  that  the  first  and  second  arguments  of  P  should 
be  the  same  is  automatically  enforced  by  the  propagation 
of  bindings  and  is  expressed  in  the  dynamically  generated 
query  at  P.  The  crucial  feature  of  the  second  situation  is 
that  X,  the  repeated  variable  in  the  antecedent  of  the  rule, 
also  appears  in  the  consequent  and  gets  bound  in  the 
reasoning  process.  Thus,  for  the  system  to  respond  to  a 
query,  any  variable  occurring  in  multiple  argument  posi¬ 
tions  in  the  antecedent  of  a  rule  that  participates  in  the 
answering  of  the  query  should  also  appear  in  the  conse¬ 
quent  of  the  rule  and  get  bound  during  the  query¬ 
answering  process.  This  constraint  is  required  in  a 
backward-reasoning  system  but  not  in  a  forward¬ 
reasoning  system.  In  the  latter,  the  rule  Vi,j/,z  P(x,y)  A 
Q( y,z)  ^  R(x,z)  would  be  encoded  as  shown  in  Figure  18. 
The  T-or  node  with  a  threshold  of  2  receives  inputs  from 
the  two  argument  nodes  that  should  be  bound  to  the  same 
filler.  It  becomes  active  if  it  receives  two  inputs  in  the 
same  phase  and  enables  the  firing  of  the  rule  via  inter¬ 
mediary  p-btu  and  T-and  nodes.  This  ensures  that  the 
rule  fires  only  if  the  second  and  first  arguments  of  P 
and  Q,  respectively,  are  bound  to  the  same  filler.  In 
the  case  of  forward  reasoning,  a  rule  that  has  variables 
occurring  in  multiple  argument  positions  in  its  conse¬ 
quent  can  participate  in  the  reasoning  process  provided 
such  variables  also  appear  in  its  antecedent  and  get  bound 
during  the  reasoning  process.  The  restrictions  mentioned 
alxive  on  the  form  of  rules  exclude  certain  inferences 
(we  discuss  these  exclusions  and  their  implications  in 
.sect.  8.2). 
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Figure  18.  Encoding  a  rule  with  repeated  variables  in  the 
antecedent  within  a  forward  reasoning  system.  The  figure  shows 
the  encoding  of  the  rule  Vx.^z  P(x,  y)^Q(y,z)^  R(x,  z).  This 
rule  should  fire  only  if  the  two  arguments  in  the  antecedent 
corresponding  to  variable  y  get  bound  to  the  same  constant.  The 
T-or  node  with  a  threshold  of  2  receives  inputs  fiom  the  two 
argument  nodes  that  should  he  bound  to  the  same  filler.  It 
becomes  active  if  it  receives  two  inputs  in  the  same  phase  and 
enables  the  firing  of  the  rule  via  intermediary  p-btu  and  T-and 
nodes.  These  nodes  have  suitable  thresholds. 


5.  Integrating  the  rule-based  reasoner 
with  an  IS-A  hierarchy 

The  rule-based  reasoner  described  in  the  previous  sec¬ 
tion  can  be  integrated  with  an  JS-A  hierarchy  represent¬ 
ing  entities,  types  (categories),  the  instance-of  relations 
between  entities  and  types,  and  the  super-Zsubconcept 
relations  between  types.  For  convenience,  we  will  refer 
to  the  instance-of,  superconcept,  and  subconcept  rela¬ 
tions  collectively  as  the  IS-A  relation.  The  augmented 
system  allows  the  occurrence  of  types  as  well  as  entities  in 
rules,  facts,  and  queries.  Consequently,  the  system  can 
store  and  retrieve  long-term  focts  such  as  “Cats  prey  on 
birds”  and  “John  bought  a  Porsche”  that  refer  to  types  (Cat 
and  Bird)  as  well  as  nonspecific  instances  of  types  (a 
Porsche).  The  system  can  also  combine  rule-based  reason¬ 
ing  with  type  inheritance.  For  example,  it  can  infer  “John 
owns  a  car”  and  “Tweety  is  scared  of  Sylvester”  (the  latter 
assumes  the  existence  of  the  rule  “If  x  preys-on  y  then  y  is 
scared  of  x”  and  the  IS-A  relations  “Sylvester  is  a  Cat”  and 
“Tweety  is  a  Bird”).  Combining  the  reasoning  system  with 
an  IS-A  hierarchy  also  focilitates  the  representation  of  the 
appropriateness  aspect  of  a  rule.  Recall  that  appropriate¬ 
ness  concerns  the  applicability  of  the  systematic  aspect  of 
a  rule  in  a  given  situation,  depending  on  the  types  of 
argument  fillers  involved  in  that  situation.  As  we  shall 
see,  the  augmented  system  allows  knowledge  in  the  IS-A 
hierarchy  to  interact  with  the  encoding  of  the  systematic 
aspects  of  a  rule  in  order  to  enforce  type  restrictions  and 
type  preferences  on  argument  fillers. 

The  integration  of  the  reasoner  with  the  IS-A  hierarchy 
described  below  is  a  first  cut  at  enriching  the  representa¬ 
tion  of  rules.  We  only  model  the  instance-of,  subconcept, 
and  superconcept  relations  and  suppress  several  issues 
such  as  a  richer  notion  of  semantic  distance,  frequency- 
and  category-size  effects,  and  prototypicality  (e.g.,  see 
Ukoff  1987). 

Figure  19  provides  an  overview  of  the  encodingand  rea¬ 
soning  in  the  integrated  reasoning  system.  The  rule-base 


Figure  19.  Interaction  between  a  rule-based  reasoner  and  an 
IS-A  hierarchy.  The  rule  component  encodes  the  rule  Vi.y 
preys-on(x,  y)  ^  scared-ofiy,  x)  and  the  la:ts  y/x.-Cat,  y:Bird 
preys-on(x,  y),  and  3x.Cat  'Vy:Bird  looes(x,  y).  The  first  fact  is 
equivalent  to  preys-on(Cat,  Bird)  and  states  that  cats  prey  on 
birds.  The  second  fact  states  that  there  is  a  cat  that  loves  all 
birds. 

part  of  the  network  in  Figure  19  encodes  the  rule  Vx,i/ 
[preys-on(x,y)  :::>  scared-oj(y,x)] ,  and  the  facts  ^ixiCat, 
y:Bird  preys-on(x,y)  and  3x;Cal,  ^iySird  loves(x,y). 

The  first  feet  says  “cats  prey  on  birds”  and  is  equivalent 
to  prey5-on(Cat,  Bird).  The  second  feet  states  “there 
exists  a  cat  that  loves  all  birds.”  The  type  hierarchy  in 
Figure  19  encodes  the  IS-A  relationships:  is-a(Bird,  Ani¬ 
mal),  is-a(Cat,  Animal),  is-a(Robin,  Bird),  is-a(Canary, 
Bird),  is-a(Tweety,  Canary),  is-a(Chirpy,  Robin),  and  is- 
a(Sylvester,  Cat).  Facts  involving  typed  variables  are 
enooded  in  the  following  manner:  A  typed,  universally 
quantified  variable  is  treated  as  being  equivalent  to  its 
type.  Thus  Vx.Cal,  y:Bird  preys-on(x,y)  is  encoded  as 
preys-on(Cat,  Bird).  A  typed,  existentially  quantified 
variable  is  encoded  using  a  unique  subconcept  of  the 
associated  type.  Thus  in  Figure  19,  Bx.-Cal  Vy.Bird 
loves( x,y)  is  encoded  as  loves(Cat-l ,  Bird),  where  Cat-1  is 
some  unique  instance  of  Cat.  In  its  current  form,  the 
system  only  deals  with  facts  and  queries  wherein  all 
existential  quantifiers  occur  outside  the  scope  of  universal 
quantifiers. 

For  now  let  us  assume  that  (1)  each  concept^*  (type  or 
entity)  in  the  IS-A  hierarchy  is  encoded  as  a  p-btu  node, 
(2)  each  IS-A  relationship,  say  is-a(A,  B),  is  encoded  using 
two  links  -  a  bottom-up  link  from  A  to  B  and  a  top-down 
link  from  B  to  A,  and  (3)  the  top-down  and  bottom-up 
links  can  be  enabled  selectively  by  built-in,  automatic 
control  mechanisms.  How  this  is  realized  is  explained  in 
section  5.2. 

The  time  course  of  activation  for  the  query  scared- 
pfiTweety,  Sylvester)?  (Is  Tweety  scared  of  Sylvester?)  is 
given  in  Figure  20.  The  query  is  posed  by  turning  on 
e:scared-of  and  activating  the  nodes  Tweety  and  Sylvester 
in  synchrony  with  the  first  and  second  arguments  of 
scared-of.  respectively.  The  liottom-up  links  emanating 
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Figure  20.  Activation  trace  for  the  query  scared-oJ(1\oeety, 
Sylvester)?,  (i.e.,  Is  Tweety  scared  of  Sylvester?). 

from  Tweety  and  Sylvester  are  also  enabled.  The  activa¬ 
tion  spreads  along  the  /S-A  hierarchy,  and  eventually 
Bird  and  Cat  start  firing  in  synchrony  with  Tweety  and 
Sylvester,  respectively.  At  the  same  time,  the  activation 
propagates  in  the  rule  base.  As  a  result,  the  initial  query 
scared-of(Tweety,  Sylvester)?  is  transformed  into  the 
query  preys-on(Cat,  Bird)?,  which  matches  the  stored 
&ct  preys-oti(Cat,  Bird)  and  leads  to  the  activation  of 
c:preys-on.  In  turn  c.-scared-o/becomes  active  and  signals 
an  affirmative  answer. 

There  are  advantages  to  expressing  certain  rules  as 
hicts.  Although  the  reasoning  system  described  in  section 
4  can  use  rules  to  draw  inferences,  it  cannot  retrieve  the 
rules  per  se;  for  knowledge  to  be  retrievable,  it  must  be  in 
the  form  of  a  fact.  Hence  integrating  the  rule-based 
reasoner  with  an  IS-A  hierarchy  has  added  significance, 
because  it  allows  certain  rulelike  knowledge  to  be  ex¬ 
pressed  as  fiicts,  thereby  making  it  retrievable  in  addition 
to  being  usable  during  inference.  Consider  “Cats  prey  on 
birds.”  The  rule-based  reasoner  can  only  express  this  as 
the  rule  Vx,  y  Cat(x)  A  Bird( y)  ^  preys-on(x,y)  amd  use  it 
to  answer  queries  such  as  prey s-on( Sylvester,  Tweety)?. 
It,  however,  cannot  answer  queries  such  as  preys-on( Cat, 
Bird)?  that  can  be  answered  by  the  integrated  system. 

5.1.  Some  technical  problems 

Two  technical  problems  must  be  solved  in  order  to  inte¬ 
grate  the  IS-A  hierarchy  and  the  rule-based  component. 
First,  the  encoding  of  the  IS-A  hierarchy  should  be 
capable  of  representing  multiple  instantiations  of  a  con¬ 
cept.  For  example,  in  the  query  discussed  above,  the 
concept  animal  would  receive  activation  originating  at 
Tweety  a,s  well  as  Sylvester.  We  would  like  the  network’s 
state  of  activation  to  represent  Iwth  the  animal  Tweety 
and  the  animal  Sylvester.  This  cannot  happen  if  concepts 
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are  represented  by  a  single  p-btu  node  because  the  node 
animal  cannot  fire  in  synchrony  with  both  Tioeety  and 
Sylvester  at  the  same  time.  Second,  the  encoding  must 
provide  built-in  mechanisms  for  automatically  coatroUing 
the  direction  of  activation  in  the  IS-A  hierarchy  so  as  to 
deal  correctly  with  queries  containing  existentially  and 
universally  quantified  variables.  The  correct  treatment  of 
quantified  variables  -  assuming  that  all  IS-A  links  are 
indefeasible,  that  is,  without  exceptions^  -  requires  that 
activation  originating  from  a  concept  C  that  is  either  an 
entity  or  the  type  corresponding  to  a  universally  quan¬ 
tified  variable  in  the  query  should  propagate  upwards  to 
all  the  ancestors  of  C.  The  upward  propagation  checks  if 
the  relevant  f^t  is  universally  true  of  some  superconcept 
of  C.  The  activation  origination  from  a  concept  C  that 
appears  as  an  existentially  quantified  variable  in  the  query 
should  propagate  to  the  ancestors  of  C,  the  descendants  of 
C,  as  well  as  the  ancestors  of  the  descendants  of  C.^  A 
possible  solution  to  these  problems  has  been  proposed  by 
Mani  and  Shastri  (1991)  and  is  outlined  below. 

5.2.  Encoding  ot  the  IS-A  hierarchy 

Each  concept  C  represented  in  the  IS-A  hierarchy  is 
encoded  by  a  group  of  nodes  called  the  concept  cluster  for 
C  (see  middle  of  Fig.  21).  The  concept  cluster  for  C  has  kj 
banks  of  p-btu  nodes,  where  kj  is  the  multiple  instantia¬ 
tion  constant  and  refers  to  the  number  of  dynamic  instan¬ 
tiations  a  concept  can  accommodate.  In  general,  the  value 
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Figure  21.  Structure  of  the  concept  cluster  for  C  and  its 
interaction  with  the  l)ottoni-up  and  top-down  switches.  The 
cluster  has  three  hanks  of  nodes  and  is  capable  of  storing  up  to 
three  distinct  instances  of  the  concept  (in  other  words,  the 
multiple  instantiation  constant  k,  equals  three).  The  ^  and  ^ 
relay  ikxIcs  have  a  threshold  of  2. 
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of  ki  may  vary  from  concept  to  concept,  but  for  ease  of 
exposition  we  will  assume  that  it  is  the  same  for  all 
concepts.  In  Figure  21,  kf  is  three.  Each  bank  of  concept 
C  consists  of  three  p-btu  nodes;  Cj,  Cj  ^ ,  Cj  | .  Each  Cj  can 
represent  a  distinct  (dynamic)  instantiation  of C.  The  relay 
nodes  Cj  |  and  C^  |  control  the  direction  of  the  propaga¬ 
tion  of  activation  from  C,..  The  nodes  ^  and  ^  have  a 
threshold  of  two.  Note  that  C,  is  connected  to  C^  ^  and 
Cj  I ,  and  Cl  |  is  linked  to  Cj  j . 

Every  concept  C  is  associated  with  two  subnetworks  - 
the  top-down  and  bottom-up  switches.  These  switches 
are  identical  in  structure  and  automatically  control  the 
flow  of  activation  to  the  concept  cluster.  A  switch  has  kj 
outputs.  Outputi  (1  ^  i  s  ki)  from  the  bottom-up  switch 
connects  to  Cj  and  Cj  ^ ,  whereas  outputi  from  the  top- 
down  switch  goes  to  nodes  Cj  and  Cj  ^ .  the  bottom-up 
switch  has  kjn,^/,  inputs  and  the  top-down  switch  has 
kjn^^p  inputs,  where  and  n^^p  are  the  number  of  sub- 
and  superconcepts  of  C.  respectively.  There  are  also  links 
from  the  Cj  nodes  to  both  switches.  The  interaction 
between  the  switches  and  the  concept  cluster  brings 
about  efiicient  and  automatic  dynamic  allocation  of  banks 
in  the  concept  cluster  by  ensuring  that  (1)  activation  gets 
channeled  to  the  concept  cluster  banks  only  if  any  “free” 
banks  are  available,  and  (2)  each  instantiation  occupies 
only  one  bank. 

The  architecture  of  the  switch  (with  k,  =  3)  is  illustrated  in 
Figure  22.  The  k,  p-btu  nodes,  S,,  .  .  .  ,  with  their  associ¬ 
ated  T-or  nodes  form  the  switch.  Inputs  to  the  switch  make  two 
connections  -  one  excitatory  and  one  inhibitory  -  to  each  of 
Sj,  .  .  .  ,  Sjjj.  As  a  result  of  these  excitatory-inhibitory  connec¬ 
tions,  nodes  Sj,  .  .  .  ,  are  initially  disabled  and  cannot 
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Figure  22.  Architecfurc  of  a  switch  that  tiiediates  the  flow  of 
activation  into  concept  clusters.  The  depicted  switeh  assumes 
that  the  associated  cluster  can  represent  up  to  three  instances. 
The  switch  provides  a  Iniilt-in  and  distributed  i-ontrol  nieeha- 
nisni  for  automatically  allcK  ating  Iranks  within  a  concept  clustc’r. 
Kach  distinct  incoming  instantiation  is  direc  ted  to  a  distinct 
tiank.  pros  idcd  one  is  ax  ailahle. 


respond  to  incoming  activation.  Any  input  activation  only  attects 
node  Sj,  because  the  switch  inputs  directly  connect  to  S,.  S, 
becomes  active  in  response  to  the  first  available  input  and 
continues  to  fire  in  phase  with  the  input  as  long  as  the  input 
persists.  As  S,  becomes  active,  the  r-or  node  associated  with  S, 
turns  on  and  enables  Sj.  However,  inhibitory  feedback  from  C, 
ensures  that  is  not  enabled  in  the  phase  in  which  Cj  is  firing. 
Thus  Sj  can  start  firing  only  in  a  different  phase.  Once  starts 
firing,  Sj  gets  enabled,  and  so  on. 

Note  that  Cj  could  receive  input  in  two  phases,  one  from  its 
bottom-up  switch  and  another  from  its  top-down  switch.  C,. 
being  a  p-btu  node,  fires  in  only  one  of  these  phases.  At  any 
stage,  ifCj,  1  £  is  k,  picks  up  activation  channeled  by  the  other 
switch,  feedback  from  C,  into  the  r-or  node  associated  with  S, 
causes  S,., ,  to  become  enabled,  even  though  S,  may  not  1k‘ 
firing.  The  net  result  is  that  as  instantiations  occur  in  the  loncept 
cluster,  the  p-btu  nodes  in  the  switch  get  enabled,  in  turn,  from 
left  to  right  in  distinct  phases. 

An  IS-A  relation  of  the  form  is-a(A,  B)  is  represented  as 

shown  in  Figure  23  by  (1)  connecting  the  A,  ^ ,  i  =  I . k, 

nodes  to  the  bottom-up  switch  for  B ,  and  (2)  connecting  the  B^  j  , 
i  —  I . k,  nodes  to  the  top-down  switeh  for  A. 

Consider  a  concept  C  in  the  IS-A  hierarchy.  Suppose  C, 
receives  activation  from  the  bottom-up  switch  in  phase  p.  In 
response,  Cj  starts  firing  in  synchrony  with  this  activation.  The 
C,  .f  node  now  receives  two  inputs  in  phase  p  (one  from  the 
bottom-up  switch  and  another  from  Cj,-  see  Fig.  21).  Since  it  has 
a  threshold  of  2,  Cj  ^  also  starts  firing  in  phase  p.  This  causes 
activation  in  phase  p  to  eventually  spread  to  the  superconcept  of 
C.  Hence  any  upward-traveling  activation  continues  to  travel 
upward,  which  is  the  required  behavior  when  C  is  associated 
with  a  universally  typed  variable.  Similarly,  when  Cj  receives 
activation  from  the  top-down  switch  in  phase  p.  Ixjth  Cj  and  Cj  ^ 
become  active  in  phase  p.  C,  ^  soon  follows  suit  because  of  the 
link  from  Cj  |  to  Cj  | .  Thus  eventually  the  whole  i''‘  bank  starts 
firing  in  phase  p.  This  built-in  mechanism  allows  a  concept 
associated  with  an  existentially  typed  variable  to  eventually 


Figure  2.3.  Fncoding  of  the /.S'-.A  R'latiim  /i>  :\  liinidle 

oi  k,  links  is  shown  as  a  singl<’  link 
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of  detoendants.  The  twitadung  mechanism  introduces  an  estra 
delay  in  the  propagation  of  activation  along  IS-A  links  and, 
typically,  the  switch  takes  three  st^  to  channel  die  activation. 
In  die  worst  -  and  also  die  least  likely  -  case,  the  switch  may 
take  up  to  eight  steps  to  propagate  activation. 

The  time  taken  to  perform  inferences  in  the  integrated 
system  is  also  independent  of  the  size  of  the  LTKB  and  is 
proportional  only  to  the  length  of  the  shortest  derivation  of  the 
query.  The  time  taken  to  perform  a  predictive  inference  is 
approximately  Ijir  +  Slftt,  where  and  1,  are  the  lengths  of  the 
shortest  chain  of  rules,  and  the  shortest  chain  of  IS-A  links, 
respectively,  that  must  be  traversed  in  order  to  perform  the 
inference.  The  time  required  to  answer  a  yes-no  query  is 
iqiproximately  +  l,v  +  In.** 

5.3.  Typed  verMhe  In  queries 

Consider  a  query  P(.  ...  x.  .  .  .)?,  where  the  argu¬ 
ment  is  specified  as  a  typed  variable  x.  If  x  is  universally 
quantified,  that  is,  the  query  is  of  the  form  Vx;  C  P(.  .  .  , 
X,  .  .  .),  Cf  and  C|*  are  activated  in  phase  with  the/^ 
argument  node  of  P  (the  subscript  t  refers  to  one  of  the 
banks  of  C).  If  x  is  existentially  quantified,  that  is  the 
query  is  of  the  form  3x;CP(.  .  .,x,  .  .  .),C,^ndC^  are 
activated  in  phase  with  the  argument  node  of  P.  As 
before  (sect.  4.3),  an  untyped  variable  in  a  query  is  not 
activated.  Simple  queries  of  the  type  is-a(C,  D)?  are 
posed  by  simply  activating  the  nodes  and  and 
observing  whether  one  or  more  D,s  become  active. 

5.4.  Encoding  iq)propriateness  es  type  lestricOons  on 
ergument  tillers 

The  IS-A  hierarchy  can  be  used  to  impose  type  restric¬ 
tions  on  variables  occurring  in  rules.  This  allows  the 
system  to  encode  context-dependent  rules  that  are  sensi¬ 
tive  to  the  types  of  the  argument  fillers  involved  in 
particular  situations.  Figure  24  shows  the  encoding  of  the 
following  rule  in  a  forward-reasoning  system;  'ixMnxmate, 
y:solid-obj  UMlk-into(x,y)  ^  hurt(x)  (i.e..  If  an  animate 
agent  walks  into  a  solid  object,  the  agent  gets  hurt).  The 
types  associated  with  variables  specify  the  admissible 
types  (categories)  of  fillers,  and  the  rule  is  expected  to  fire 
only  if  the  fillers  bound  to  the  ailments  are  of  the 
appropriate  type.  The  encoding  makes  use  of  T-or  nodes 
that  automatically  check  whether  the  filler  of  an  argument 
is  of  the  appropriate  type.  Thus  the  r-or  node  a  in  Figure 
24  would  become  active  if  and  only  if  the  first  argument  of 
walk-into  is  firing  in  synchrony  with  animate,  indicating 
that  the  filler  of  the  argument  is  of  type  animate.  Sim¬ 
ilarly,  the  T-or  node  b  would  become  active  if  and  only  if 
the  second  argument  of  walk-into  is  firing  in  synchrony 
with  solid-object,  indicating  that  the  filler  of  this  argu¬ 
ment  is  of  type  solid-object.  The  activation  of  nodes  a  and 
b  would  enable  the  propagation  of  activity  from  the 
antecedent  to  the  consequent  predicate.  In  a  forward 
reasoner,  typed  variables  are  allowed  only  in  the  anteced¬ 
ent  of  the  rule. 

In  the  backward  reasoner,  typed  variables  are  allowed 
only  in  the  consequent  of  a  rule.  The  encoding  of  a  typed 
universally  quantified  viiriable  in  the  consequent  is  simi¬ 
lar  to  the  encoding  of  an  entity  in  the  consequent  of  a  rule 
explained  in  section  4.5  (.see  Fig.  14).  Instead  of  originat- 
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Figure  24.  Encoding  rules  with  typed  variables:  The  network 
fiagment  encodes  the  rule  Vx ;  animate,  y ;  tolid-obj  walk-into( x, 
y)  hurt(x).  The  numbers  associated  with  nodes  denote 
thresholds  (only  thresholds  other  than  1  have  been  indicated 
explicitly).  The  T-or  node  a  (b)  become  active  if  and  only  if  the 
first  (second)  argument  node  ef  walk-into  fires  in  synchrony  with 
the  concept  animate  (solid-obj).  Once  active,  these  nodes  en¬ 
able  the  propagation  of  binding  to  the  predicate  hurt.  Thus  type 
restrictions  are  enforced  using  temporal  synchrony. 

ing  at  an  entity,  the  inhibitory  link  originates  at  the 
concept  representing  the  type  of  the  universally  quan¬ 
tified  variable.  The  encoding  of  a  typed  existentially 
quantified  variable  is  similar  to  that  of  a  typed  universally 
quantified  variable  except  that  the  inhibitory  link  origi¬ 
nates  from  a  unique  subconcept  of  the  associated  concept 
(for  details  refer  to  Mani  &  Shastri  1991). 

The  rule  V  x:animate,  y:solid-ohj  walk-into(x,y)  ^  hurt(x)  is 
logically  equivalent  to  the  rule  V  x.i/  animate(x)  A  solid-obj(y)  A 
walk-into(x,y)  ^  hurt(x).  Thus  it  would  appear  that  the  IS-A 
hierarchy  is  not  essential  for  encoding  type  restrictions  on  rules. 
Note,  however,  that  although  the  former  variant  has  only  one 
predicate  in  the  antecedent,  the  latter  has  three.  This  increase 
in  the  number  of  antecedent  predicates  can  be  very  costly, 
especially  in  a  forward-reasoning  system  capable  of  supporting 
multiple  dynamic  predicate  instantiations  (Mani  &  Shastri 
1992).  In  such  a  system,  the  number  of  nodes  required  to  encode 
a  rule  is  proportional  to  k^,  where  kg  is  the  bound  on  the  number 
of  times  a  predicate  may  be  instantiated  dynamically  during 
reasoning  (see  sect.  6),  and  m  equals  the  number  of  predicates  in 
the  antecedent  of  the  rule.  Thus  it  is  critical  that  m  be  very 
small.  The  IS-A  hierarchy  plays  a  crucial  role  in  reducing  the 
value  of  m  by  allowing  restrictions  on  predicate  arguments  to  be 
expressed  as  type  restrictions.^ 

5.5.  Encoding  soft  end  defeesIMe  rules 

The  proposed  solution  to  the  binding  problem  can  be 
generalized  to  soft/defeasible  rules.  Observe  that  the 
strength  of  dynamic  bindings  may  be  represented  by  the 
degree  of  synchronization  between  an  argument  and  a 
filler  (this  possibility  was  suggested  by  Jed  Harris,  per¬ 
sonal  communication).  Such  a  scheme  becomes  plausible 
if  each  argument  is  encoded  by  an  ensemble  of  nodes  (see 
sect.  7.3),  for  then  the  degree  of  coherence  in  the  phase  of 
firing  of  nodes  within  an  argument  ensemble  can  indicate 
the  strength  of  the  binding  the  argument  is  participating 
ill.  In  the  limiting  case,  a  highly  dispersed  activity  in  an 
argument  ensemble  may  mean  that  the  argument  is 
bound  to  one  of  the  active  entities,  although  it  is  not  clear 
which  (Shastri  1993a). 

In  addition  to  specifying  a  mechanism  for  representing 
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the  strength  of  dynamic  bindings  and  rule  firing,  we  also 
need  to  specify  the  basis  for  computing  these  strengths.  It 
is  customary  to  view  the  strength  of  a  rule  as  a  numerical 
quantity  associated  with  a  rule  (e.g.,  certainty  factors  in 
MYCIN;  Buchanan  &  ShortlifTe  1984).  Such  as  “atomic"  and 
uninterpreted  view  of  the  strength  of  a  rule  is  inadequate 
for  modeling  rules  involving  n-ary  predicates.  Our  ap¬ 
proach  involves  defining  the  “strength"  of  a  rule  (and 
similarly,  the  strength  of  a  binding)  to  be  a  dynamic 
quantity  that  depends  upon  the  types/features  of  the 
entities  bound  to  arguments  in  the  rule  at  the  time  of  rule 
application.  Such  a  strength  measure  also  generalizes  the 
notion  of  type  restrictions  on  rules  to  type  preferences  on 
rules.  Thus  instead  of  rules  of  the  form  Vi ;  animate,  y  : 
solid-obj  walk-into(x,y)  hurt(x),  the  system  can  encode 
rules  of  the  form: 

Vi,y  walk-into(x,y)  ^  [with  strength 
a(type(x),type(y))j  ^  hurt(x), 

where  the  strength  of  the  rule  may  vary  from  one  situation 
to  another  as  a  function,  ct,  of  the  types  of  the  argument 
fillers  in  a  given  situation.  Observe  that  the  value  of  0(1^, 
tf)  need  not  be  known  for  all  types  t,  and  tj  in  the  /S-A 
hierarchy,  and  may  be  inherited.  For  example,  if  a(tj,  tj) 
is  not  known  but  a{t,„,  t„)  is,  and  t,  and  tj  are  subtypes  of 
f„,  and  t„,  respectively,  then  t„)  can  be  used  in  place 
of  oitj ,  <2)-  This  is  analogous  to  property  inheritance  in  an 
IS -A  hierarchy,  where  property  values  may  be  attached  to 
just  a  few  concepts  and  the  property  values  of  the  rest  of 
the  concepts  inferred  via  inheritance.  The  proposed 
treatment  would  allow  the  system  to  incorporate  excep¬ 
tional  and  default  information  during  reasoning.  This 
relates  to  Shastri  s  (1988a;  1988b)  work  on  a  connectionist 
semantic  network  (see  sect.  9.2). 


6.  Representing  multiple  dynamic  instantiations 
of  a  predicate 

The  representation  of  dynamic  bindings  described  thus 
far  cannot  simultaneously  represent  multiple  dynamic 
facts  about  the  same  predicate.  The  proposed  representa¬ 
tion  can  be  extended  to  do  so  by  generalizing  the  scheme 
for  encoding  multiple  instantiations  of  concepts  outlined 
in  section  5.2.  The  extension  assumes  that  during  an 
episode  of  reflexive  reasoning  each  predicate  can  be 
instantiated  only  a  bounded  number  of  times.  In  general, 
this  bound  may  vary  across  predicates  and  some  critical 
predicates  may  have  a  marginally  higher  bound.  For  ease 
of  exposition,  however,  we  will  assume  that  this  bound  is 
the  same  for  all  predicates  and  refer  to  it  as  ^2-  The  ability 
to  handle  multiple  instantiations  of  the  same  predicate 
allows  the  system  to  deal  with  more  complex  inferential 
dependencies,  including  circularities  and  bounded  recur¬ 
sion.  The  system  can  make  use  of  rules  such  as  Vx,y 
sibling  (x,y)  ^  sibling  (y,x).  A  forward-reasoning  system 
can  use  a  rule  such  as  Vx,t/,z  greater  (x,y)  A  greateri y,z) 
^  greater  (x,z)  and  infer  “a  is  greater  than  c"  on  being  told 
“a  is  greater  than  b”  and  “b  is  greater  than  c.” 

Since  up  to  ^2  dynamic  instantiations  of  a  predicate  may 
have  to  be  represented  simultaneously,  the  representa¬ 
tion  of  an  n-ary  predicate  is  augmented  so  that  each 
predicate  is  repres<;nted  by  k2  hanks  of  nodes,  with  each 


bank  containing  a  collector,  an  enabler,  and  n  argument 
nodes.  For  a  given  predicate,  P,  the  enabler  of  the  i‘*  bank 
e:P(  will  be  active  whenever  the  bank  has  been  instanti¬ 
ated  with  S(  line  dynamic  binding.  The  collector  c.Pj  of  the 
F*  bank  will  be  activated  whenever  the  dynamic  bindings 
in  the  i*'*  bank  match  the  knowledge  encoded  in  the 
LTKB.  Figure  25  depicts  the  encoding  of  two  binary 
predicates  P  and  Q  and  a  ternary  predicate  R. 

Given  that  a  predicate  is  represented  using  multiple 
banks  of  predicate  and  argument  nodes,  the  connections 
between  arguments  of  the  antecedent  and  consequent 
predicates  of  a  rule  have  to  be  mediated  by  a  “switching” 
mechanism  similar  to  the  one  described  in  section  5.2. 
The  switch  automatically  channels  input  instantiations 
into  available  banks  of  its  associated  predicate.  It  also 
ensures  that  each  distinct  instantiation  occupies  only  one 
bank,  irrespective  of  the  number  of  consequent  predi¬ 
cates  that  may  be  communicating  this  instantiation  to  the 
switch. 

With  the  inclusion  of  the  switch  in  the  backward  rea¬ 
soning  system,  the  number  of  nodes  required  to  repre¬ 
sent  a  predicate  and  a  long-term  fact  becomes  propor- 


Figure  25.  The  encoding  of  predicates  for  accommodating 
multiple  instantiations:  P  and  Q  are  binary  predicates  and  fi  is  a 
ternary  predicate.  The  encoding  assumes  that  any  predicate- 
may  be  instantiated  at  most  three  times  (i.e.,  the  multiple 
instantiation  constant  itj  =  3).  An  n-ary  predicate  is  represented 
by  Atj  banks  of  nodes.  The  connections  suggest  that  there  are  two 
rules,  one  of  the  form  P( )  ^  and  the  other  of  the  form  P( ) 
R()  (the  argument  correspondence  is  not  shown).  The  connec¬ 
tions  between  antecedent  and  consequent  predicates  of  a  rulc- 
are  mediated  by  a  "switching”  mechanism  similar  to  the:  onc- 
descrihed  in  Figure  22.  The  switch  for  P  automatically  channels 
incoming  instantiations  of  P  into  available  banks  of  P.  The  switch 
has  ^2  output  "cables,”  where  each  cable  consist:  of  output  links 
to  a  predicate  bank  of  P.  The  inputs  to  the  switch  are  cables  from 
banks  of  predicates  that  are  in  the  consequent  of  rules  in  which  P 
occurs  in  the  antecedent. 
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tional  to  and  the  number  of  nodes  required  to  encode  a 
rule  becomes  proportional  to  Furthermore,  the  time 
required  for  propagating  multiple  instantiations  of  a  pred¬ 
icate  increases  by  a  factor  of  Thus  there  are  significant 
space  and  time  costs  associated  with  multiple  instantia¬ 
tion  of  predicates.  The  complete  realization  of  the  switch 
and  its  interconnection  is  described  in  Mani  and  Shastri 
(1992). 

7.  Biological  plausibility 

Recent  neurophysiological  data  suggest  that  synchro¬ 
nous,  rhythmic  activity  occurs  in  the  brain  and  that  the 
time  course  of  such  activity  is  consistent  with  the  require¬ 
ments  of  reflexive  reasoning.  The  data  also  provide  evi¬ 
dence  in  support  of  the  hypothesis  that  the  cat  visual 
system  solves  the  dynamic-binding  problem  by  using 
temporal  synchrony. 

7.1.  Neurophyslologh.al  support 

There  is  considerable  evidence  for  the  existence  of  rhyth¬ 
mic  activity  in  the  animal  brain.  Synchronous  activity  has 
been  documented  for  some  time  in  the  olfactory  bulb, 
hippocampus,  and  the  visual  cortex  (Freeman  1981;  Ger- 
stein  1970;  MacVicar  &  Dudek  1980;  Toyama  et  al.  1981). 
The  most  compelling  evidence  for  such  activity,  however, 
comes  from  findings  of  synchronous  oscillations  in  the 
visual  cortex  of  anesthetized  cats  responding  to  moving 
visual  stimuli  (Eckhorn  et  al.  1988;  1990;  Engel  et  al. 
1991;  Gray  &  Singer  1989;  Gray  et  al.  1989;  1991).  These 
findings  are  based  on  the  correlational  analysis  of  Leal 
field  potentials  and  multiunit,  as  well  as  single-unit, 
recordings.  Recently,  similar  activity  has  also  been  re¬ 
ported  in  an  awake  and  behaving  monkey  (Kreiter  & 
Singer  1992).  Relevant  aspects  of  the  experimental  find¬ 
ings  are  summarized  below: 

1.  Synchronous  oscillations  have  been  observed  at  fre¬ 
quencies  ranging  from  30  to  80  Hz  (a  typical  frequency  is 
around  50  Hz). 

2.  Synchronization  of  neural  activity  can  occur  within  a 
few  periods  (sometimes  even  one  period)  of  oscillations 
(Gray  et  al.  1991). 

3.  In  most  cases  synchronization  occurs  with  a  lag  or 
lead  of  less  than  3  msec,  although  in  some  cases  it  even 
occurs  with  precise  phase  locking  (Gray  et  al.  1991). 

4.  Synchronization  of  neural  activity  occurs  (a)  be¬ 
tween  local  cortical  cells  (Eckhom  et  al.  1988;  Gray  & 
Singer  1989),  (b)  among  distant  cells  in  the  same  cortical 
area  (Gray  et  al.  1989),  (c)  among  cells  in  two  different 
cortical  areas  -  for  example,  areas  17  and  18  (Eckhorn  et 
al.  1988)  and  areas  17  and  PMLS  (Engel  et  al.  1991),  and 
(d)  among  cells  across  the  two  hemispheres  (Engel  et  al. 
1991). 

5.  Once  achieved,  synchrony  may  last  several  hundred 
msec  (Gray  et  al.  1991). 

The  synchronous  activity  observed  in  the  brain  is  a 
complex  and  dynamic  phenomenon.  The  frequency  and 
degree  of  phase  l(x:king  varies  considerably  over  time  and 
the  synchronization  is  most  robust  when  viewed  as  a 
property  of  groups  of  neurons.  The  nature  of  synchronous 
activity  assumed  by  our  model  is  an  idealiz.atioii  of  such  a 
c'ompK'x  phenomenon  (but  see  sects.  7, .3  &  10.1-10.4). 


7.1.1.  Tamporal  synchrony  and  dynamic  bindings  in  ths 

cat  visual  cortex.  Neurophysiological  findings  idso  sup¬ 
port  the  hypothesis  that  the  dynamic  binding  of  visual 
features  pertaining  to  a  single  object  may  be  realized  by 
the  synchronous  activity  of  cells  encoding  these  features 
(see  Eckhom  et  al.  1990;  Engel  et  al.  1991).  In  one 
experiment,  multiunit  responses  were  recorded  from 
four  different  sites  that  had  overlapping  receptive  fields 
but  different  orientation  preferences  -  157®,  67®,  22®,  and 
90®,  respectively.  A  vertical  light  bar  resulted  in  a  syn¬ 
chronized  response  at  sites  1  and  3,  whereas  a  light  bar 
oriented  at  67®  led  to  a  synchronized  response  ai  sites  2 
and  4.  A  combined  stimulus  with  the  two  light  bars 
superimposed  led  to  activity  at  all  the  four  sites,  but 
although  the  activity  at  sites  1  and  3  was  synchronized  and 
that  at  sites  2  and  4  was  synchronized,  there  was  no 
correlation  in  the  activity  across  these  pairs  of  sites  (Engel 
et  al.  1991).  Such  experimental  evidence  suggests  that  the 
synchronous  activity  in  orientation  specific  cells  may  be 
the  brain’s  way  of  encoding  that  these  cells  are  participat¬ 
ing  in  the  representation  of  a  single  object.  This  is  analo¬ 
gous  to  the  situation  in  Figure  9,  wherein  the  syn¬ 
chronous  activity  of  the  nodes  recip,  owner,  and  cs-seller 
in  phase  with  Mary  is  the  system’s  way  of  encoding  that  all 
these  roles  are  being  filled  by  the  same  object,  Mary. 

7.2.  Some  neurally  plausible  values  of  system 
parameters 

The  neurophysiological  data  cited  above  also  provide  a 
basis  for  m^ing  coarse  but  neurally  plausible  estimates  of 
some  system  parameters.  'The  data  indicate  that  plausible 
estimates  of  and  may  be  12  and  33  msec, 
respectively,  and  a  typical  value  of  it  may  be  20  msec.  The 
degree  of  synchronization  varies  from  episode  to  episode, 
but  a  conservative  estimate  of  w,  the  width  of  the  window 
of  synchrony,  may  be  derived  on  the  basis  of  the  cumula¬ 
tive  histogram  of  the  phase  difference  (lead  or  lag)  ob¬ 
served  in  a  large  number  of  synchronous  events.  The 
standard  deviation  of  the  phase  differences  was  2.6  msec 
in  one  data  set  and  3  msec  in  another  (Gray  et  al.  1991). 
Thus  a  plausible  estimate  of  w  may  be  about  6  msec. 
Given  that  the  activity  of  nodes  can  get  synchronized 
within  a  few  cycles,  sometimes  even  within  one,  and 
given  the  estimates  of  and  it  is  plausible  that 
synchronous  activity  can  propagate  from  one  p-btu  node 
to  another  in  about  50  msec.  The  data  also  suggest  that 
synchronous  activity  lasts  long  enough  to  support  epi¬ 
sodes  of  reflexive  reasoning  requiring  several  steps. 

7.3.  Propagation  of  synchronous  activity  -  a  provisional 
model 

Our  system  requires  the  propagation  of  synchronous 
activity  over  interconnected  nodes  in  spite  of  nonzero  and 
noisy  propagation  delays.  The  neurophysiological  evi¬ 
dence  cited  in  the  previous  sections  suggest  that  such 
propagation  occurs  in  the  cortex.  The  exact  neural  mecha¬ 
nisms  underlying  the  propagation  of  such  activity,  how¬ 
ever,  remain  to  he  determined.  Abeles  (1982;  1991)  has 
argued,  on  the  basis  of  anatomical  and  physiological  data, 
theoretical  analysis,  and  simulation  results  that  syn¬ 
chronous  activity  can  propagate  over  chains  of  neurons 
eonin-et«-d  in  a  many-to-many  lashion  (synfire  chains)  with 
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a  small  and  stable  “jitter,"  even  if  random  fluctuations  are 
taken  into  account.  Bienenstock  (1991)  has  examined  how 
synfire  chains  may  arise  in  networks  as  a  result  of  learn¬ 
ing.  Below  we  outline  a  provisional  model  (Mandelbaum 
&  Shastri  1990)  that  demonstrates  that  synchronized 
activity  can  propagate  in  spite  of  noisy  propagation  de¬ 
lays.  The  model  is  meant  to  demonstrate  the  feasibility  of 
such  propagation  and  should  not  be  viewed  as  a  detailed 
neural  model. 

We  assume  that  each  argument  in  the  reasoning  system 
is  represented  by  an  ensemble  of  n  nodes  rather  than  just 
a  single  node.  Connections  between  arguments  are  en¬ 
coded  by  connecting  nodes  in  the  appropriate  ensembles: 
If  ensemble  A  connects  to  ensemble  B,  then  each  node  in 
A  is  randomly  connected  to  m  nodes  in  B  (m  ^  n).  Thus, 
on  average,  each  node  in  B  receives  inputs  from  m  nodes 
in  A  (see  Fig.  26)  and  has  a  threshold  comparable  to  m. 
The  propagation  delay  between  nodes  in  two  different  en¬ 
sembles  is  assumed  to  be  noisy  and  is  modeled  as  a 
Gaussian  distribution.  If  ensemble  A  is  connected  to 
ensemble  B  and  nodes  in  ensemble  A  are  firing  in  syn¬ 
chrony,  then  we  desire  that  within  a  few  periods  of 
oscillation  nodes  in  ensemble  B  start  firing  in  synchrony 
with  nodes  in  ensemble  A. 

Nodes  within  an  ensemble  are  also  sparsely  intercon¬ 
nected,  with  each  node  receiving  inputs  from  a  few 
neighboring  nodes  within  the  ensemble.  Synchronization 
within  an  ensemble  is  realized  as  a  result  of  the  interac¬ 
tion  between  the  feedback  received  by  a  node  from  its 
neighbors  within  the  ensemble.  The  model  makes  use  of 
the  negative  slope  of  the  threshold-time  characteristic 
during  the  relative  refractory  period  (RRP)  to  modulate 
the  timing  of  the  spike  generated  by  a  node.  Observe  that 
a  higher  excitation  can  hasten,  while  a  lower  excitation 
can  delay,  the  firing  of  a  node.  At  the  same  time,  the  lag 
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Figure  26.  Individual  p-l)lu  nodes  arc  rcplacc<l  liv'  an  cti.si’in- 
l)lc  of  sucli  norlcs.  A  connection  lictwccn  a  pair  of  individual 
p-l)tn  ikkIcs  is  replaced  hy  a  nuinbcr  ol  random  intcrcnscnihlc 
connections.  Nodes  within  an  cuscmhli'  can  coininunicatc  with 
their  iinincdiatc  ncighhors  in  the  ensemhh’  aiul  the  inlraen- 
scnihlc  propagation  delays  are  assumed  to  he  much  smaller  than 
the  intermisemhle  propagation  delas  s 


between  the  firing  times  of  nodes  in  two  connected 
ensembles  due  to  the  mean  propagation  delay  is  partially 
offset  by  the  interaction  between  the  various  parameters 
of  the  system  enumerated  below. 

The  thresiiold-time  characteristic  of  a  node  and  the 
distribution  of  the  arrival  times  of  input  spikes  from  a 
preceding  ensemble  are  illustrated  in  Figure  27.  After 
firing,  a  node  enters  an  absolute  refractory  period  (ARP), 
in  which  its  threshold  is  essentially  infinite.  The  ARP  is 
followed  by  a  decaying  RRP,  during  which  the  thresh¬ 
old  decays  to  its  resting  value.  During  the  RRP,  the 
threshold-time  characteristic  is  approximated  as  a  straight 
line  of  gradient  g  (a  linear  approximation  is  not  critical). 
The  incoming  spikes  from  a  preceding  ensemble  arrive 
during  a  node’s  ARP  and  the  early  part  of  its  RRP.  It  is 
assumed  that  immediate  neighbors  within  an  ensemble 
can  rapidly  communicate  their  potential  to  each  other. 

A  node’s  potential  is  the  combined  result  of  the  interen¬ 
semble  and  intraensemble  interactions  and  in  the  period 
between  spikes  is  modeled  as 

P.U  -^  At)  =  P/t)  In,W  +  a  *  IJP/t)  -  P/t)I. 

where  P^t)  is  the  potential  of  node  i  at  time  t.  The  change 
in  potential  is  caused  by  /n/tj,  the  input  arriving  at  node  i 
from  nodes  in  the  preceding  ensemble  as  well  as  the 
difference  in  the  potential  of  i  and  that  of  its  immediate 
neighbors/  In  the  simulation,  j  ranged  over  six  immedi¬ 
ate  neighbors  of  i.  If  nodes  i  and  j  are  immediate  neigh¬ 
bors  and  i  is  firing  ahead  of/  then  we  want  i  to  hasten  the 
firing  ofj  by  sending  it  an  excitatory  signal  and  j  to  delay 
the  firing  i  by  sending  it  an  inhibitory  signal.  Doing  so 
would  raise  the  potential  of  j,  causing  it  to  fire  early,  and 
lower  the  potential  of  i,  causing  it  to  fire  later  in  the  next 
cycle.  Thus  i  and  j  would  tend  to  synchronize. 

The  results  of  a  sample  simulation  are  shown  in  Figure 
28.  The  diagram  shows  the  cycle-by-cycie  distribution  of 


Figure  27.  The  time  course  of  a  iiotle's  threshold.  Alter  gener¬ 
ating  a  spike  a  node  enters  an  ahsointe  relractoiA  period 
The  AHP  is  followed  h\  a  relatixe  relraelor)  period  (HIIP).  .After 
the  RRP  a  node's  threshold  reverts  to  its  normal  lesel.  T'ne 
distrihution  of  the  arrisal  times  ol  signals  from  a  <imne<  led 
ensemble  is  depii  ted  by  the  shaded  rr'gion  'Hie  noisy  propaga¬ 
tion  <iela\  s  are  modeled  as  a  (ianssian. 
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Figure  28.  The  cycle-by-cycle  distribution  of  the  firing  times  of  nodes  within  a  “driven”  ensemble  being  driven  by  a  “driver” 
ensemble  whose  nodes  are  firing  in  synchrony.  The  left-hand  “wall"  of  the  isometric  diagram  displays  the  standard  deviation  and  mean 
of  the  node  firing  times  with  reference  to  the  ideal  firing  time.  The  nodes  in  the  driven  ensemble  become  synchronized  in  spite  of 
noisy  propagation  delays.  The  maximum  lag  in  the  firing  times  of  nodes  in  the  driven  ensemble  becomes  less  than  3  msec  and  the 
mean  lag  becomes  less  than  1  msec  within  two  cycles.  By  the  end  of  seven  cycles  the  maximum  and  mean  lags  reduce  to  1  and  0.2 
msec,  respectively. 


the  firing  times  of  nodes  within  a  “driven”  ensemble  that 
is  being  driven  by  a  “driver”  ensemble  whose  nodes  are 
oscillating  in  phase  lock.  At  was  chosen  to  be  0.001  time 
units  (i.e.,  all  calculations  were  done  every  Viooo  of  a  time 
unit),  where  a  unit  of  time  may  be  assumed  to  be  1  msec. 
Other  simulation  parameters  were  as  follows;  (1)  n,  the 
number  of  nodes  in  ensemble,  equals  64;  (2)  m,  the 
interensemble  connectivity,  equals  20;  (3)  g,  the  slope  of 
the  threshold  during  the  RRP,  equals  0.032;  (4)  a,  the 
“coupling”  factor  between  immediate  neighbors  within  an 
ensemble,  equals  0.07;  (5)  d,  the  average  interensemble 
propagation  delay,  equals  4.5  time  units;  (6)  s,  the  stan¬ 
dard  deviation  of  interensemble  propagation  delay, 
equals  1.5  time  units;  and  (7)  it,  the  expected  period  of 
oscillation,  equals  10.5  time  units. 

As  shown  in  Figure  28,  despite  noisy  propagation 
delays,  the  maximum  lag  in  the  firing  of  nodes  in  the 
“driven”  ensemble  becomes  less  tlian  3  msec  and  the 
mean  lag  becomes  less  than  1  msec  within  two  cycles.  By 
the  end  of  seven  cycles  the  maximum  and  mean  lags 
reduce  to  1  and  0.2  msec,  respectively. 


8.  Psychological  implications 

In  this  section  we  examine  the  psychological  implications 
of  our  system,  especially  in  view  of  the  biologically  moti¬ 
vated  estimates  of  the  system  parameters  discussed  in 
section  7.3. 

8.1.  A  neurally  plausible  model  of  reflexive  reasoning 

The  proposed  system  can  encode  specific  as  well  as 
general  instantiation-independent  knowledge  and  per¬ 
form  a  broad  range  of  reasoning  with  efficiency.  The 
system  makes  use  of  very  simple  nodes,  and  yet  its  node 
requirement  is  only  linear  in  the  size  of  the  LTKB  (the 
size  being  measured  in  terms  of  the  number  of  predicates, 
facts,  rules,  concepts,  and  IS-A  relations).  Thus  the  sys¬ 
tem  illustrates  how  a  large  LTKB  may  be  encoded  by 
using  only  a  fraction  of  lO'^  nodes. 

The  system  demonstrates  that  a  class  of  forward  and 
backward  reasoning  can  be  performed  very  rapidly,  in 
time  independent  of  the  size  of  the  LTKB.  Below  we  set 
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the  values  of  appropriate  system  parameters  to  neurally 
plausible  values  identified  in  section  7.3  and  indicate  the 
time  the  system  takes  to  perform  certain  inferences.^ 
These  times  are,  at  best,  broad  indicators  of  the  time  we 
expect  the  internal  reasoning  process  to  take.  Also  note 
that  they  do  not  include  the  time  that  would  be  taken  by 
perceptual,  linguistic,  and  motor  processes  to  process  and 
respond  to  inputs. 

8.1 .1 .  Some  typical  retrieval  and  inference  timings.  Let  us 

choose  rr  to  be  20  msec  and  assume  that  p-btu  nodes  can 
synchronize  within  two  periods  of  oscillations.  The  sys¬ 
tem  takes  320  msec  to  infer  “John  is  jealous  of  Tom”  after 
being  given  the  dynamic  facts  “John  loves  Susan"  and 
“Susan  loves  Tom”  (this  assumes  the  rule,  if  x  loves  y  and  y 
loves  2  then  x  is  jealous  of  z).  The  system  takes  260  msec  to 
infer  “John  is  a  sibling  of  Jack,"  given  “Jack  is  a  sibling  of 
John.”  Similarly,  the  system  takes  320  msec  to  infer 
“Susan  owns  a  car”  after  its  internal  state  is  initialized  to 
represent  “Susan  bought  a  Rolls-Royce."  If  the  systems 
LTKB  includes  the  long-term  fact  “John  bought  a  Rolls- 
Royce,”  the  system  takes  140,  420,  and  740  msec,  respec¬ 
tively,  to  answer  yes  to  the  queries  Did  John  buy  a  Rolls- 
Royce?  Does  John  own  a  car?  and  Can  John  sell  a  car? 

Thus  our  system  demonstrates  that  a  class  of  reasoning 
can  occur  rapidly,  both  in  the  forward  (predictive)  mode 
as  well  as  backward  (query  answering)  mode.  The  above 
times  are  independent  of  the  size  of  the  LTKB  and  do  not 
increase  when  additional  rules,  facts,  and  IS-A  relation¬ 
ships  are  added  to  the  LTKB.  If  anything,  these  times 
may  decrease  if  one  of  the  additional  rules  is  a  composite 
rule  and  short-circuits  an  existing  inferential  path.  For 
example,  if  a  new  rule  “if  i  buys  y  then  x  can  sell  y”  were  to 
be  added  to  the  LTKB,  the  system  would  answer  the 
query  Can  John  sell  a  car?  in  420  msec  instead  of  740 
msec. 

8.1 .2.  Variations  in  inference  and  retrieval  times.  Consider 
two  p-btu  nodes  A  and  B  such  that  A  is  connected  to  B 
(although  we  are  referring  to  individual  nodes,  the  follow¬ 
ing  comment  would  also  apply  if  A  and  B  were  ensembles 
of  nodes).  It  seems  reasonable  to  assume  that  the  number 
of  cycles  required  for  B  to  synchronize  with  A  will  depend 
on  the  synaptic  efficacy  of  the  link  from  A  to  B.  This 
suggests  that  the  time  taken  by  the  propagation  of  bind¬ 
ings  -  and  hence  rule  firing  -  will  vary,  depending  on  the 
weights  on  the  links  between  the  argument  nodes  of  the 
antecedent  and  consequent  predicates.  Rules  whose  asso¬ 
ciated  links  have  high  weights  will  (ire  and  propagate 
bindings  faster  than  rules  whose  associated  links  have 
lower  weights.  It  also  follows  that  different  facts  will  take 
different  times  to  be  retrieved,  depending  on  the  weights 
of  the  links  connecting  the  appropriate  arguments  and 
filler  concepts  (see  Fig.  10).  Note  that  the  inhibitory 
signal  from  an  argument  will  continue  to  block  the  activa¬ 
tion  of  a  fact  node  until  the  signals  from  the  filler  concepts 
and  the  argument  get  synchronized.  Similarly,  during  the 
processing  of  toh-queries,  the  time  it  would  take  for  the 
filler  concepts  to  synchronize  with  the  binder  units  will 
depend  on  the  weights  of  the  links  from  the  binder  nodes 
to  the  concept  nodes  (see  Fig.  17).  Thus  the  retrieval  of 
argument  fillers  would  be  faster  if  the  weights  on  the 
appropriate  links  are  high.**  Observe  that  the  variation  in 


times  refers  to  the  variations  in  the  time  it  takes  for  nodes 
to  synchronize  and  not  the  time  it  takes  for  nodes  to 
become  active.  This  suggests  that  the  time  course  of 
systematic  inferences  and  associative  priming  may  be 
quite  different. 

8.2.  Nature  of  reflexive  reasorting 

Our  model  suggests  several  constraints  on  the  nature  of 
reflexive  reasoning.  These  have  to  do  with  (1)  the  capacity 
of  the  working  memory  underlying  reflexive  reasoning, 
(2)  the  form  of  rules  that  may  participate  in  such  reason¬ 
ing,  and  (3)  the  depth  of  the  chain  of  reasoning. 


8.2.1.  The  working  memory  underlying  reflexive  reason¬ 
ing.  Dynamic  bindings  and,  hence,  dynamic  facts  are 
represented  in  the  system  as  a  rhythmic  pattern  of  activity 
over  nodes  in  the  LTKB  network.  In  functional  terms,  this 
transient  state  of  activation  can  be  viewed  as  a  limited- 
capacity  dynamic  working  memory  that  temporarily  holds 
information  during  an  episode  of  reflexive  reasoning.  Let 
us  refer  to  this  working  memory  as  the  WMRR. 

Our  system  predicts  that  the  capacity  of  the  WMRR  is 
very  large  and,  at  the  same  time,  very  limited!  The 
number  of  dynamic  facts  that  can  simultaneously  be 
present  in  the  WMRR  is  high  and  is  given  by  k2p,  where 
k^  is  the  predicate  multiple  instantiation  constant  intro¬ 
duced  in  section  6,  and  p  is  the  number  of  predicates 
represented  in  the  system.  The  number  of  concepts  that 
may  be  active  simultaneously  is  also  very  high  and  equals 
kjC,  where  c  is  the  number  of  concepts  in  the  IS-A 
hierarchy  and  kj  is  the  multiple  instantiation  constant  for 
concepts  introduced  in  section  5.2.  But,  as  we  discuss 
below,  there  are  two  constraints  that  limit  the  number  of 
dynamic  facts  that  may  actually  be  present  in  the  WMRR 
at  any  given  time. 

8.2.2.  Working  memory,  medium-term  memory  and  overt 
short-term  memory.  Before  moving  on  let  us  make  two 
observations.  First,  the  dynamic  facts  represented  in  the 
WMRR  during  an  episode  of  reasoning  should  not  be 
confused  with  the  small  number  of  short-term  facts  that 
an  agent  may  overtly  keep  track  of  during  reflective 
processing  and  problem  solving.  In  particular,  the 
WMRR  should  not  be  confused  with  the  (overt)  short¬ 
term  memory  implicated  in  various  memory  span  tasks 
(for  review  see  Baddeley  1986).  Second,  our  reasoning 
system  implies  that  a  large  number  of  dynamic  facts  will 
be  produced  as  intermediate  results  during  reasoning  and 
would  be  represented  in  the  WMRR.  These  facts,  how¬ 
ever,  are  only  potentially  relevant  and  would  remain 
covert  and  decay  in  a  short  time  unless  they  turn  out  to  be 
relevant  in  answering  a  “query”  or  providing  an  explana¬ 
tion.  We  expect  that  only  a  small  number  of  dynamic  facts 
would  turn  out  to  be  relevant,  and  those  that  do  would 
“enter”  a  medium-term  memory,  where  they  would  be 
available  for  a  much  longer  time  (see  sect.  10.5).  Some  of 
these  facts  may  also  enter  the  overt  short-term  memory. 
Note  that  this  short-term  memory  need  not  be  a  physi¬ 
cally  distinct  module.  It  may  simply  con.sist  of  facts/ 
entities  in  the  WMRR  that  are  currently  receiving  an 
attentional  spotlight  (cf.  Crick  1984;  Crick  &  Koch  19^a). 
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8.2.3.  A  bound  on  the  number  of  distinct  entitles  refer¬ 

enced  In  the  WMRR.  During  an  episode  of  reasoning,  each 
entity  involved  in  dynamic  bindings  occupies  a  distinct 
phase  in  the  rhythmic  pattern  of  activity.  Hence  the 
number  of  distinct  entities  that  can  occur  as  argument 
fillers  in  the  dynamic  facts  represented  in  the  WMRR 
cannot  exceed  >  where  is  the  maximum 

period  (corresponding  to  the  lowest  fre<iuency)  at  which 
p-btu  nodes  can  sustain  synchronous  oscillations,  and  w 
equals  the  width  of  the  window  of  synchrony.  Thus  the 
WMRR  may  represent  a  large  number  of  facts,  as  long  as 
these  facts  refer  to  only  a  small  number  of  distinct  enti¬ 
ties.  Note  that  the  activation  of  an  entity  together  with  all 
its  active  superconcepts  counts  as  only  one  entity. 

In  section  7.2  we  pointed  out  that  a  neurally  plausible 
value  of  is  about  33  msec  and  a  conservative  estimate 
of  (t>  is  around  6  msec.  This  suggests  that  as  long  as  the 
number  of  entities  referenced  by  the  dynamic  facts  in  the 
WMRR  is  five  or  less,  there  will  essentially  be  no  cross¬ 
talk  among  the  dynamic  facts.  If  more  entities  occur  as 
argument  fillers  in  dynamic  facts,  the  window  of  syn¬ 
chrony  (d  would  have  to  shrink  in  order  to  accommodate 
all  the  entities.  For  example,  w  would  have  to  shrink  to 
about  5  msec  in  order  to  accommodate  7  entities.  As  a) 
shrinks,  the  possibility  of  cross-talk  between  dynamic 
bindings  would  increase  and  eventually  disrupt  the  rea¬ 
soning  process.  The  exact  bound  on  the  number  of  dis¬ 
tinct  entities  that  may  fill  arguments  in  dynamic  facts 
would  depend  on  the  smallest  feasible  value  of  w.  Given 
the  noise  and  variation  indicated  by  the  data  on  syn¬ 
chronous  activity  cited  in  section  7.1,  it  appears  unlikely 
that  (I)  can  be  less  than  3  msec.  Hence  we  predict  that  a 
neurally  plausible  upper  bound  on  the  number  of  distinct 
entities  that  can  be  referenced  by  the  dynamic  facts 
represented  in  the  WMRR  is  about  10.  This  prediction  is 
consistent  with  our  belief  that  most  cognitive  tasks  per¬ 
formed  without  deliberate  thought  tend  to  involve  only  a 
small  number  of  distinct  entities  at  a  time  (though,  of 
course,  these  entities  may  occur  in  multiple  situations 
and  relationships). 

It  is  remarkable  that  the  bound  on  the  number  of 
entities  that  may  be  referenced  by  the  dynamic  facts  in 
the  WMRR  relates  so  well  to  7  ±  2,  the  robust  measure  of 
short-term  memory  capacity  (Miller  1956).  This  unex¬ 
pected  coincidence  merits  further  investigation  as  it  sug¬ 
gests  that  temporal  synchrony  may  also  underlie  other 
short-term  and  dynamic  representations.  Similar  limita¬ 
tions  of  the  human  dynamic-binding  mechanism  are  also 
illustrated  in  experimental  work  on  the  attribute-binding 
problem  (Stenning  et  al.  1988). 

The  bound  on  the  number  of  distinct  entities  refer¬ 
enced  in  the  WMRR  is  independent  of  similar  bounds  on 
the  working  memories  of  other  subsystems.  As  we  dis¬ 
cuss  in  section  10.4,  dynamic  structures  in  the  working 
memor\'  of  other  subsystems  may  refer  to  different  .sets 
of  entities  using  phase  distributions  local  to  those  sub¬ 
systems. 

8.2.4.  A  bound  on  the  multiple  instantiation  of  predicates. 

The  capacity  of  the  WMRR  is  also  limited  by  the  con¬ 
straint  that  it  may  only  contain  a  small  number  of  dynamic 
facts  pertaining  to  each  predicate.  This  constraint  stems 
from  the  high  cost  of  maintaining  multiple  instantiations 


of  a  predicate.  As  stated  in  section  6,  in  a  backward- 
reasoning  system,  if  denotes  the  bound  on  the  number 
of  times  a  predicate  may  be  instantiated  during  an  episode 
of  reasoning,  then  the  number  of  nodes  required  to 
represent  a  predicate  and  the  associated  long-term  &cts  is 
proportional  to  k^,  and  the  number  of  nodes  required  to 
encode  a  rule  is  proportional  to  k§.  Thus  a  backward¬ 
reasoning  system  that  can  represent  three  dynamic  in¬ 
stantiations  of  each  predicate  will  have  anywhere  from 
three  to  nine  times  as  many  nodes  as  a  system  that  can 
only  represent  one  instantiation  per  predicate.  In  a 
forward-reasoning  system  the  cost  is  even  higher  and  the 
number  of  nodes  required  to  encode  a  rule  is  kf,  where  m 
is  the  number  of  antecedents  in  the  rule.  The  time 
required  for  propagating  multiple  instantiations  of  a  pred¬ 
icate  also  increases  by  a  factor  of  kj.  In  view  of  the 
significant  space  and  time  costs  associated  with  multiple 
instantiation  and  the  necessity  of  keeping  these  resources 
within  bounds  in  the  context  of  reflexive  reasoning,  we 
predict  that  k2  is  quite  small,  perhaps  no  more  than  three. 
As  observed  in  section  6,  kg  need  not  be  the  same  for  all 
predicates,  and  it  is  possible  that  some  critical  predicates 
may  have  a  slightly  higher  k2.^* 

8.2.5.  Form  of  rules  that  may  participate  In  reflexive  rea¬ 
soning.  In  section  4.9  we  pointed  out  that  when  answer¬ 
ing  queries  based  on  the  long-term  knowledge  encoded  in 
the  LTKB,  our  reflexive-reasoning  system  cannot  use 
rules  that  contain  variables  occurring  in  multiple  argu¬ 
ment  positions  in  the  antecedent  unless  such  variables 
also  appear  in  the  consequent  and  get  bound  during  the 
query-answering  process.  A  similar  constraint  applies  to 
forward  (predictive)  reasoning:  When  making  predictions 
based  on  given  dynamic  facts,  a  system  cannot  use  a  rule 
that  contains  variables  occurring  in  multiple  argument 
positions  in  the  consequent,  unless  such  variables  also 
appear  in  the  antecedent  and  get  bound  during  the 
reasoning  process.  These  constraints  predict  that  certain 
queries  cannot  be  answered  in  a  reflexive  manner  even 
though  the  corresponding  predictions  can  be  made  reflex- 
ively.  For  example,  consider  an  agent  whose  LTKB  in¬ 
cludes  the  rule  "if  x  loves  y  and  y  loves  z  then  x  is  jealous  of 
z”  and  the  long-term  facts  “John  loves  Mary”  and  “Mary 
loves  Tom.”  Our  system  predicts  that  if  this  agent  is  asked 
Is  John  jealous  of  Tom?  she  will  be  unable  to  answer  the 
query  in  a  reflexive  manner.  Note  that  the  antecedent  of 
the  rule  includes  a  repeated  variable,  y,  that  does  not 
occur  in  the  consequent.  Hence  our  system  predicts  that 
answering  this  question  will  require  deliberate  and  con¬ 
scious  processing  (unless  the  relevant  long-term  facts  are 
active  in  the  WM  RR  for  some  reason  at  the  time  the  query 
is  posed).  However,  an  agent  who  has  the  above  rule  about 
love  and  jealousy  in  its  LTKB  would  be  able  to  infer  “John 
is  jealous  of  Tom"  in  a  reflexive  manner,  on  being  “told” 
“John  loves  Mary"  and  “Mary  loves  Tom.”  This  is  because 
such  an  inference  invob’cs  forward  (predictive)  reasoning. 

As  another  example  of  the  predictions  made  b\’  the 
c-onstraint,  assume  that  our  agent’s  conception  of  kinship 
relations  is  one  wherein  the  maternal/paternal  distinction 
at  the  grandparent  level  is  not  primary.  Let  us  also  assume 
that  the  agent  s  maternal  grandfather  is  George.  The 
constraint  pre<licf.s  that  the  agent  can’’  :)t  answer  yes  to  the 
<|uery  Is  (ieorge  your  maternal  graiultather?  in  a  reflexive 
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manner  even  though  the  agent  may  be  able  to  answer  the 
question  Is  George  your  grandfather?  in  a  reflexive  man¬ 
ner  (this  example  is  due  to  Feldman,  personal  communi¬ 
cation).  The  basis  of  this  prediction  is  as  follows:  If  "mater¬ 
nal  grandfather”  is  not  a  primary  kinship  relation  then  it 
must  be  computed  by  using  an  appropriate  rule.  Given 
the  nature  of  the  maternal-grandfather  relationship,  any 
rule  that  does  so  would  violate  the  repeated  variable 

restriction. 

The  restrictions  imposed  on  the  reasoning  system  also 
imply  that  it  is  not  possible  to  apply  the  abstract  notion 
of  transitivity  in  a  reflexive  manner  when  answering 
queries.  Observe  that  we  need  to  state  Vx,i/,z  P(x,y)  A 
P(y,z)  ^  P(x,z)  in  order  to  assert  that  the  relation  P  is 
transitive  and  the  rule  has  the  variable  y  occurring  twice 
in  the  antecedent  but  not  even  once  in  the  conse<juent. 
Given  that  transitivity  plays  an  important  role  in  com- 
monsense  reasoning  -  to  wit,  reasoning  about  sub-  and 
supercategories,  part-of  relationships,  greater  and  less 
than  -  the  inability  to  handle  transitivity  might  appear  to 
be  overly  limiting.  However,  this  is  not  the  case.  We 
believe  that  as  far  as  query  answering  is  concerned, 
humans  are  only  good  at  dealing  with  the  transitivity  of  a 
small  number  of  relations.  In  these  cases,  the  transitivity 
of  the  appropriate  relations  is  encoded  explicitly  and  the 
computation  of  transitivity  does  not  require  the  use  of  an 
abstract  transitivity  rule.  The  organization  of  concepts  in 
an  IS-A  hierarchy  using  IS-A  links  to  capture  the  sub¬ 
class/superclass  relationship  is  an  excellent  case  in  point. 
The  use  of  IS-A  links  converts  the  problem  of  computing 
the  transitive  closure  from  one  of  applying  the  transitivity 
rule  Vx,y,z  lS-A(x,y)  A  lS-A(y,z)  ^  IS-A(x,z),  to  one  of 
propagating  activation  along  links. 

8.2.6.  Bound  on  the  depth  of  the  chain  of  reasoning.  Two 

things  might  happen  as  activity  propagates  along  a  chain 
of  argument  ensembles  during  an  episode  of  reflexive 
reasoning.  First,  the  lag  in  the  firing  times  of  successive 
ensembles  may  gradually  build  up  due  to  the  propagation 
delay  introduced  at  each  level  in  the  chain.  Second,  the 
dispersion  within  each  ensemble  may  gradually  increase 
due  to  the  variations  in  propagation  delays  and  the  noise 
inherent  in  synaptic  and  neuronal  processes.  Whereas 
the  increased  lag  along  successive  ensembles  will  lead  to  a 
“phase  shift”  and,  hence,  binding  confusions,  tbe  in¬ 
creased  dispersion  of  activity  within  successive  ensem¬ 
bles  will  lead  to  a  gradual  loss  of  binding  information. 
Increased  dispersion  would  mean  less  phase  specificity 
and,  hence,  more  uncertainty  about  the  argument  s  filler. 
Because  of  the  increase  in  dispersion  along  the  chain  of 
reasoning,  the  propagation  of  activity  will  correspond  less 
and  less  to  a  propagation  of  argument  bindings  and  more 
and  more  to  an  as.siK’iative  spread  of  activation.  For 
example,  the  propagation  of  activity  along  th<‘  chain  of 
rules  ^  .  .  .  P,/.T,(/,:)  resulting  from 

the  input  P,(«,/),c)  may  leatl  to  a  state  of  activation  where 
all  one  can  say  about  P„  is  that  tliere  is  an  instance  of  P„ 
involving  the  entities  o.  h,  and  r.  Imt  if  is  not  clear  which 
entity  fills  which  role  of  P„. 

It  follows,  then,  that  the  (h-pth  to  which  an  a.gmit  may 
reason  during  reflexive  reasouiiig  is  hounded.  Thus  an 
agmit  may  he  uuahle  to  maki'  a  prediction  (or  aiisw<-r  a 
(|uerv)  -  even  when  the  prediction  (or  auswi’r)  logieallx 
follows  Ironi  the  know  ledge  I’luodi'd  in  the  I  ,TKH  if  th<’ 


length  of  the  derivation  leading  to  the  prediction  (or  the 
answer)  exceeds  this  bound.  It  should  be  possible  to  re¬ 
late  the  bound  on  the  depth  of  reflexive  reasoning  to  spe¬ 
cific  physiological  parameters,  but  at  this  time  we  are  not 
aware  of  the  relevant  data  upon  which  to  base  such  a  pre¬ 
diction.  We  would  welcxime  pointers  to  appropriate  data. 

8.3.  Nature  ot  Inputs  to  the  rellexh/e  reaeoner 

Our  system  demonstrates  that  rulelike  knowledge  may  Ik- 
used  effectively  during  reflexive  reasoning,  provick-d  if  is 
integrated  into  the  LTKB  and  wired  into  the  inferential 
dependency  graph.  It  also  demonstrates  that  reflexive 
reasoning  can  effectively  deal  with  small  dynamic  input  in 
the  form  of  facts.  33  We  suspect  that  the  ability  of  any 
reflexive-reasoning  system  to  deal  with  novel  rulelike 
information  will  lie  extremely  limited;  if  the  input  txm- 
tains  rulelike  information  that  is  not  already  present  in  the 
LTKB,  the  agent  may  have  to  revert  to  a  reflective  mod<- 
of  reasoning  in  order  to  use  this  information.  This  may 
partially  explain  why  human  agents  find  it  difficult  to 
perform  syllogistic  reasoning  without  deliberate  and  con¬ 
scious  effort  even  though,  in  a  formal  sense,  such  reason¬ 
ing  is  simpler  than  some  of  the  reasoning  tasks  we  can 
perform  in  a  reflexive  manner.  In  syllogistic  rca.soning, 
the  “input”  has  the  form  of  rules  and  the  reflexive  reasoner 
may  be  unable  to  use  them  unle.ss  thev  are  alreadv  part  of 
the  LTKB. 

8.4.  The  reflexive-reasoning  system  and  production 
systems 

As  may  be  evident,  there  exists  a  correspondence  be¬ 
tween  a  production  system  and  the  reflexive-reasoning 
system  described  in  this  article  -  the  declarative  memory 
corresponds  to  long-term  facts,  productions  correspond 
to  rules,  and  the  working  memory  corresponds  to  the 
WMRR.  Thus  our  system  can  be  viewed  as  a  parallel- 
production  system. 

Estimates  of  the  working-memory  capacity  of  produc¬ 
tion  system  models  range  from  very  small  (alniut  seven 
elements)  to  essentially  unlimited.  Oi.r  work  points  out 
that  the  working  memory  of  a  reflexive  priK-essor  can 
contain  a  very  large  number  of  elements  (dynamic  facts  in 
the  case  of  the  reasoning  system)  as  long  as  (1)  the 
elements  do  not  refer  to  more  than  (about)  10  entities,  and 
(2)  the  elements  do  not  involve  the  same  relation  (predi¬ 
cate)  more  than  (about)  three  times.  The  proposed  system 
also  demonstrates  that  a  large  number  of  rules,  even  those 
containing  variables,  may  fire  in  parallel  as  long  as  an\' 
predicate  is  not  instantiated  more  than  (about)  three  times 
(cf  Newell’s  suggestion  [1980]  that  while  productions 
without  variables  can  be  executed  in  parallel,  productions 
with  variables  may  have  to  bt-  executed  in  a  serial  fashion). 

A  number  of  cognitive  models  art-  ba.sed  on  the  produe- 
tion  system  formalism;  two  of  the  most  comiirelu-nsixt- 
are  .\tn  *  (Anderson  198.3)  and  so.Mi  (Newell  1990).  Neu- 
rally  plausible  realizations  of  tln-si-  models,  howt-ver,  have 
not  bet-n  propost-d.  Although  sevt-ral  asiieetsof  sueli 
as  its  list-  oi  levt-ls  ol  aclixation  and  wi-ighted  links  have 
neural  underpinniugs.  it  has  not  bet-n  shown  how  t-ertain 
critical  aspt-ets  of  the  motlel  mav'  bt-  realized  in  a  neuralK 
plausihli-  maimer.  For  example,  .■\t;'r*  reineseuts  iirodnc- 
tions  with  variables,  hut  .Aiiilersoii  does  not  provide  .i 
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neurally  plausible  explanation  of  how  bindings  are  propa¬ 
gated  and  how  nodes  determine  whether  two  bindings  are 
the  same.  In  his  exposition  of  SOAR,  Newell  has  analyzed 
the  time  course  of  neural  processes  to  estimate  how  long 
various  soar  operations  should  take,  but  he  did  not 
suggest  how  a  system  such  as  SOAR  may  be  realized  in  a 
neurally  plausible  manner  (see  Newell  1990,  p.  440). 
Although  a  complete  mapping  of  comprehensive  systems 
such  as  SOAR  and  act*  to  a  neurally  plausible  architecture 
still  remains  an  open  problem,  our  system  could  provide  a 
basis  for  doing  so.  In  this  context,  the  biologically  mo¬ 
tivated  constraints  on  the  capacity  of  the  WMRR  indi¬ 
cated  by  our  system  seem  particularly  significant. 

8.5.  Re/lex/ve  reasoning  and  text  understanding 

Several  problems  will  have  to  be  addressed  in  order  to 
integrate  the  proposed  reasoning  system  with  a  compre¬ 
hensive  cognitive  system.  Some  of  these  problems  are 
discussed  in  section  10;  they  include  (1)  interactions 
between  the  reflexive-reasoning  system  and  medium- 
term  memory;  (2)  how  medium-term  memory  is  mapped 
into  long-term  memory;  (3)  how  the  set  of  entities  in  the 
WMRR  changes  in  a  fluid  manner;  and  (4)  how  distinct 
modules  performing  different  reflexive  processes  (e.g.,  a 
parser  and  a  reasoner)  ,>.>mmunicate  with  one  another. 

The  problem  of  text  understanding  is  particularly  rele¬ 
vant  because  there  exists  a  rich  body  of  empirical  data  on 
the  role  of  inferences  based  on  long-term  knowledge 
during  language  understanding.  The  data  strongly  sug¬ 
gest  that  certain  types  of  inferences  (i.e.,  inferences  that 
help  establish  referential  and  causal  coherence)  do  occur 
very  rapidly  and  automatically  during  text  understanding 
(see,  e.g..  Carpenter  &  Just  1977;  Keenan  et  al.  1984; 
Kintsch  1974;  McKoon  &  Ratcliff  1980).  The  evidence  for 
the  automatic  occurrence  of  elaborative  inferences,  how¬ 
ever,  is  mixed  (see,  e.g.,  Kintsch  1988;  McKoon  &  Ratcliff 
1986;  Potts  et  al.  1988;  Singer  &  Ferreira  1983).  Elabora¬ 
tive  inferences  predict  highly  likely  consequences  of 
events  mentioned  in  the  discourse  and  correspond  to 
forward  reasoning  in  our  system.  However,  as  Potts  et  al. 
(1988)  point  out,  available  experimental  evidence  does 
not  rule  out  the  possibility  that  elaborative  inferences  are 
performed  during  reading.  The  experiments  involve  two- 
sentence  texts,  and  it  is  likely  that  the  subjects  do  not  have 
any  inherent  interest  in  making  predictive  inferences.  It 
may  turn  out  that  subjects  do  make  such  inferences  when 
reading  longer  texts. 

Our  system  suggests  that  reflexive  reasoning  can  occur 
in  backward  as  well  as  forward  direction  (although,  as 
pointed  out  in  sect.  8.2,  there  are  critical  differences  in 
the  form  of  rules  that  participate  in  the  two  types  of 
reasoning).  This  suggests  that  agents  may  perform  infer¬ 
ences  required  for  establishing  referential  and  causal 
coherence  as  well  as  predictive  inferences  in  a  reflexive 
manner.  The  system’s  prediction  can  be  resolved  with  the 
observed  data  if  we  assume  that  the  results  of  predictive 
inferences  only  last  for  a  short  time  (say  a  few  hundred 
msec)  and  then  disperse  unless  subseciucnt  input  (text) 
indicates  that  these  inferences  are  significant  and/or  rele¬ 
vant  to  the  discourse.  Only  those  inferred  facts  that  turn 
out  to  be  relevant  are  encoded  in  medium-term  memorv' 
and  beeonu'  available  for  a  longer  time. 

rlie  extensive  body  ol  empirical  data  on  the  role  ol 
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long-term  knowledge  and  inferences  in  reading  will  in¬ 
form  future  work  on  our  model  of  reflexive  reasoning.  At 
the  same  time,  we  hope  that  the  constraints  on  the  form  of 
rules  and  the  capacity  of  the  working  memory  underlying 
reflexive  reasoning  that  have  emerged  from  our  work  will 
help  experimental  psychologists  in  formulating  and  test¬ 
ing  novel  hypotheses  about  the  role  of  reflexive  reasoning 
in  reading. 

8.6.  Retiexh/e  reasoning  and  the  fan  ^teet 

Our  initial  hypothesis  as  well  as  our  system’s  behavior 
suggests  that  the  time  taken  by  reflexive  reasoning  is 
independent  of  the  size  of  the  LTKB.  This  conflicts  with 
the  fan  effect  (Anderson  1983;  Reder  &  Ross  1983);  this 
effect  generally  refers  to  the  following  phenomenon:  The 
more  facts  associated  with  a  particular  concept,  the  slower 
the  recognition  of  any  one  of  the  facts.  We  hypothesize 
that  the  fan  effect  applies  only  to  medium-term  knowl¬ 
edge  and  not  to  long-term  knowledge  (we  use  long-term 
in  the  sense  discussed  in  sect.  1.1).  Consider  the  nature  of 
the  task  that  leads  to  the  fan  effect.  An  agent  studies  a  set 
of  facts  until  he  can  recall  them.  Subsequently,  the  agent 
is  asked  to  recognize  and  make  consistency  judgments 
about  the  learned  material  and  his  reaction  times  are 
recorded.  It  is  observed  that  the  time  taken  to  recognize  a 
fact  increases  with  the  number  of  facts  studied  by  the 
agent  involving  the  same  concept(s).  Observe,  however, 
that  the  fan  effect  concerns  an  arbitrary  collection  of  facts 
that  the  agent  studied  prior  to  the  experiment.  We  hy¬ 
pothesize  that  these  facts  are  only  encoded  in  the  agent’s 
medium-term  memory  and  are  not  assimilated  into  the 
agent’s  LTKB.  Thus  the  fan  effect  is  not  about  facts  in  the 
LTKB,  rather  it  is  about  facts  in  medium-term  memory. 

9.  Related  work 

In  spite  of  the  apparent  significance  of  reflexive  reasoning 
there  have  been  few  attempts  at  modeling  such  rapid 
inference  with  reference  to  a  large  body  of  knowledge. 
Some  past  exceptions  are  Fahlman’s  (1979)  work  on  netl 
and  Shastri’s  (1988a)  work  on  a  connectionist  semantic 
memory  (see  also  Geller  &  Du  1991).  Both  these  models 
primarily  deal  with  inheritance  and  classification  within 
an  IS-A  hierarchy.  Holldobler  (1990)  and  Ullman  and  van 
Gelder  (1988)  have  proposed  parallel  systems  for  per¬ 
forming  more  powerful  logical  inferences,  but  these  sys¬ 
tems  have  unrealistic  space  requirements.  The  number  of 
nodes  in  Holldobler’s  system  is  quadratic  in  the  size  of  the 
LTKB,  and  the  numl>er  of  processors  required  by  Ullman 
and  van  Gelder  is  even  higher.^  A  significant  amount  of 
work  has  been  done  by  researchers  in  knowledge  repre¬ 
sentation  and  reasoning  to  identify  classes  of  inference 
that  can  lie  performed  efficiently  (e.g. ,  see  Bylander  et  al. 
1991;  Fri.sch  &  Allen  1982;  Kautz  &  Selman  1991;  Le¬ 
vesque  1988;  Ixwesfjue  &  Brachman  1985;  McAllester 
1990).  The  results,  however,  have  largely  been  negative. 
The  few  jiositive  results  rejiorted  do  not  provide  insights 
into  the  problem  of  reflexive  reasoning  because  they 
assume  a  weak  notion  of  efficiency  (ix)lynomial  time), 
restrict  inference  in  implausible  ways  (e.g.,  by  excluding 
chaining  of  rules),  and/or  deal  with  overlv'  limited  exjircs- 
sivmiess  (e  g.,  oiilv  propositional  calciilnsV 
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9.1.  between  were  and  tha  pmpoaed  ayatem 

It  was  pointed  out  in  section  3  that  as  an  abstract  computa¬ 
tional  mechanism  temporal  synchrony  is  related  to  the 
notion  of  marker  passing.  It  was  also  mentioned  that 
Fahiman  (1979)  had  proposed  the  design  of  a  parallel 
marker-passing  machine  (netl)  that  could  solve  a  class  of 
inheritanc'e  and  ret'ognition  problems  efficiently.  But  as 
discussed  in  section  3,  NETL  was  nut  neurally  plausible. 
In  view  of  the  corresiwndenc-e  between  temporal  syn¬ 
chrony  and  marker  passing,  our  system  offers  a  neurally 
plausible  realization  of  marker  passing.  It  is  important  to 
underscore  the  signiiicance  of  this  realization.  First, 
nothing  is  stored  at  a  node  in  order  to  mark  it  with  a 
marker.  Instead,  the  time  of  bring  of  a  node  relative  to 
other  nodes  and  the  coincidence  between  the  time  of 
firing  of  a  node  and  that  of  other  nodes  has  the  effect  of 
marking  a  node  with  a  particular  marker.  Furthermore,  a 
node  does  not  have  to  match  anything  akin  to  markers.  It 
simply  has  to  detect  whether  appropriate  inputs  are 
coincident.  Second,  the  system  does  not  require  a  central 
controller.  Once  a  query  is  posed  to  the  system  by 
activating  appropriate  nodes,  it  computes  the  solution 
without  an  external  controller  directing  the  activity  of 
every  node  at  every  step  of  processing.  The  systems 
ability  to  do  so  stems  from  the  distributed  control  mecha¬ 
nisms  that  are  an  integral  part  of  the  representation. 
Some  examples  of  such  built-in  mechanisms  that  automat¬ 
ically  control  the  propagation  of  activation  are  the  C  ^ , 
and  C I  relay  nodes  in  concept  clusters  (sect.  5.2),  and  the 
switch  networks  associated  with  concepts  and  predicates 
that  automatically  direct  the  (low  of  activation  to  unused 
banks  (sect.  5.2  &  6).  Third,  our  realization  of  marker 
passing  quantifies  the  capacity  of  the  working  memory 
underlying  reflexive  processing  in  terms  of  biological 
parameters.  As  we  have  seen,  these  constraints  have 
psychological  significance. 

In  addition  to  demonstrating  that  a  marker-passing 
system  can  be  realized  in  a  neurally  plausible  manner,  our 
system  shows  that  a  richer  class  of  representation  and 
reasoning  problems  than  that  realized  in  netl  can  be 
solved  using  temporal  synchrony  -  and,  hence,  marker 
passing.  If  we  set  aside  the  issue  of  exceptional  knowledge 
(see  below),  netl  represented  an  IS-A  hierarchy  and 
n-ary  facts,  where  terms  in  a  fact  could  be  types  or 
instances  in  the  IS-A  hierarchy.  NETL,  however,  did  not 
represent  rules  involving  n-ary  predicates,  netl  derived 
inherited  facts  by  replacing  terms  in  a  fact  by  their 
subtypes  or  instances  (this  characterization  accounts  for 
NETLs  ability  to  perform  simple  [unary]  inheritance  as 
well  as  relational  inheritance),  but  it  did  not  combine 
inheritance  with  rule-based  reasoning.  Consider  the  ex¬ 
ample  of  relational  inheritance  where  preys-on(Sylt>ester, 
Tweety)  is  derived  from  preys-on(Cat,  Bird).  Ob.serve 
that  this  only  involves  substituting  Sylvester  for  Cat  and 
Tweety  for  Bird  on  the  basis  of  the  IS-A  relations  is- 
a(Sijlvester,  Cat)  and  i.s-a(Tiveety,  Bird).  This  form  of 
reasoning  is  weaker  than  that  performed  by  our  system. 
Our  reasoning  system  can  also  eiiciKle  rules  such  as  Vx,y 
preys-on(x,y)  ^  scared-afiy.x).  and  given  preys-on(Cat. 
Bird)  it  cannot  only  infer  prey. s-on( Sylvester,  Tweety)  but 
also  scared-of(Tweety.  Syhn'ster). 

The  presence  of  a  central  controller  allowed  netl  to 


compute  and  enumerate  results  of  queries  involving  an 
arbitrary  sequence  of  set  intersection  and  set  union  oper¬ 
ations.  NETl’s  central  controller  could  decompose  a  query 
into  the  required  sequenc-e  of  intersection  and  union 
operations  and  instruct  netl  to  perform  these  operations 
in  the  proper  sequence.  This  is  something  our  reflexive¬ 
reasoning  system  does  not  (and  is  not  intended  to)  do. 

netl  also  allowed  exceptions  in  the  IS-A  hierarchy,  but 
its  treatment  of  exceptions  suffered  from  serious  semantic 
problems  (see  Fahiman  et  al.  1981;  Touretzky  1986).  In 
sections  5.4  and  5.5  we  described  how  rules  with  type 
restrictions  are  encoded  in  our  system  and  explained  how 
this  encoding  may  be  extended  to  deal  with  tyjx.'  prefer¬ 
ences  so  that  the  appropriateness  -  or  strength  -  of  a  rule 
firing  in  a  specific  situation  may  depend  on  the  types  of 
the  entities  involved  in  that  situation.  The  ability  to 
encode  evidential  rules  will  allow  our  system  to  incoqxi- 
rate  exceptional  and  default  information  in  an  IS-A  hier¬ 
archy  (see  below). 

9.2.  csm:  8  connectlonlst  semantic  memory 

Shastri  (1988a;  1988b)  developed  csn,  a  connectionist 
semantic  network  that  could  solve  a  class  of  inheritance 
and  classification  problems  in  time  proportional  to  the 
depth  of  the  conceptual  hierarchy.  CSN  computed  its 
solutions  in  accordance  with  an  evidential  formaliz^ation 
and  dealt  with  exceptional  and  conflicting  information  in  a 
principled  manner.  It  found  the  most  likely  answers  to 
inheritance  and  recognition  queries  by  combining  the 
information  enaided  in  the  semantic  network.  CSN  oper¬ 
ated  without  a  central  network  controller  that  regulated 
the  activity  of  its  nodes  at  each  step  of  processing.  This 
was  the  result  of  using  distributed  mechanisms  (e.g., 
relay  nodes)  for  controlling  the  flow  of  activity.  A  complete- 
integration  of  a  esN-like  system  and  the  proposed  reason¬ 
ing  system  should  lead  to  a  system  capable  of  dealing  with 
evidential  and  conflicting  rules  and  facts  in  a  principlerl 
manner. 

9.3.  Some  connectlonlst  approaches  to  the 

dynamic-binding  problem 

Feldman  (1982)  addressed  the  problem  of  dynamically 
associating  any  element  of  a  group  of  N  entities  with  any 
element  of  another  group  of  N  entities  using  an  intercon¬ 
nection  network.  He  showed  how  it  was  possible  to 
achieve  the  association  task  with  an  interconnection  net¬ 
work  having  only  4N^'^  nodes.  The  work,  however,  did  not 
address  how  such  a  representation  could  be  incorporated 
within  a  reasoning  system  where  bindings  need  to  be 
propagated. 

Touretzky  and  Hinton(1988)  developed  dcps,  adistrib- 
uted  connectionist  production  system,  to  address  the 
problem  of  rule-based  reasoning  within  a  connectionist 
framework.  The  ability  of  ix:PS  to  maintain  and  propagate 
dynamic  bindings  is,  however,  quite  limited.  First,  dcp.s 
can  only  deal  with  niU-s  that  have  a  single  variable. 
Sc*cxmd,  DCPS  is  serial  at  tin-  knowledge  level,  Ix-cause 
each  step  in  its  reasoning  process  involves  selecting  and 
applyitiga  single  rul<-.  Thus  in  terms  ol  efficiency,  ix.p.s  is 
similar  to  a  traditional  (.s<-rial)  pnalnetion  system  and  must 
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deal  with  the  cumhinaturics  of  search.  Third,  it  assumes 
that  there  is  only  one  candidate  rule  that  can  (ire  at  each 
step  of  processing.  Hence  it  is  not  a  viable  model  of 
reflexive  reasoning. 

Smolensky  (1990)  descrilies  a  representation  of  dy¬ 
namic  bindings  using  tensor  products.  Arguments  and 
fillers  are  viewed  as  n  and  »n  dimensional  vectors,  respec¬ 
tively,  and  a  binding  is  viewed  as  the  n  *  ni  dimensional 
vector  obtained  by  taking  the  tensor  product  of  the 
appropriate  argument  and  filler  vectors.  Hie  system  en¬ 
codes  arguments  and  fillers  as  patterns  over  pools  of 
n  argument  and  tn  filler  nodes  and  argument  bindings 
over  a  network  of  n  *  in  nodes.  The  system  can  only 
encode  n  *  m  bindings  without  cross-talk,  although  a 
greater  number  of  bindings  can  be  stored  if  some  cross¬ 
talk  is  acceptable.  Dolan  and  Smolensky  (1989)  describe 
TPPS,  a  production  system  based  on  the  tensor  product 
encoding  of  dynamic  bindings.  However,  like  dcps,  tpps 
is  also  serial  at  the  knowledge  level  and  allows  only  one 
rule  to  fire  at  a  time. 

The  primary  cause  of  knowledge-level  serialism  in  DCPS 
and  TPPS  is  that  these  systems  represent  arguments  and 
fillers  as  patterns  of  activity  over  common  pools  of  nodes. 
This  severely  limits  the  number  of  arguments,  fillers,  and 
dynamic  bindings  that  may  be  represented  at  the  same 
time.  In  contrast,  the  compact  encoding  of  predicates, 
arguments,  and  concepts  in  our  system  allows  it  to  repre¬ 
sent  and  propagate  a  large  number  of  dynamic  bindings 
simultaneously. 

Another  system  that  uses  compact  encoding  and  sup¬ 
ports  knowledge-level  parallelism  is  robin  (Lange  & 
Dyer  1989).  This  system  was  designed  to  address  the 
problem  of  natural-language  understanding  -  in  particu¬ 
lar,  the  problem  of  ambiguity  resolution  using  evidential 
knowledge,  robin  and  our  system  have  several  features  in 
common;  for  example,  robin  can  also  maintain  a  large 
number  of  dynamic  bindings  and  encode  “rules”  having 
multiple  variables.  There  are  also  important  differences; 
ROBIN  permanently  allocates  a  unique  numerical  signa¬ 
ture  to  each  constant  in  the  domain  and  represents  dy¬ 
namic  bindings  by  propagating  the  signature  of  the  appro¬ 
priate  constant  to  the  argument(s)  to  which  it  is  bound. 
The  use  of  signatures  allows  robin  to  deal  with  a  large 
number  of  entities  during  an  episode  of  reasoning.  There 
is,  however,  a  potential  problem  with  the  use  of  signa¬ 
tures:  If  each  entity  has  a  unique  signature,  then  signa¬ 
tures  can  end  up  Ireing  high-precision  quantities.  For 
example,  assigning  a  distinct  signature  to  50,000  concepts 
will  require  a  precision  of  16  bits.  Hence  propagating 
bindings  would  require  nodes  to  propagate  and  compare 
high-precision  analog  values.  This  problem  may  be  cir¬ 
cumvented  by  representing  signatures  as  n-bit  vectors 
and  encorling  arguments  as  clusters  of  n  nodes  communi¬ 
cating  via  bundles  of  links  (see  sect.  9.4). 

The  temporal-synchrony  approach  can  be  compared  to 
tbe  signature-based  approach  as  follows:  Although  the 
total  number  of  entities  is  very  large,  the  number  of 
entities  involved  in  a  particular  reasoning  episode  is 
small,  lienee  in.stead  of  u.ssigning  a  distinct  signature  to 
every  entity,  it  suHiees  to  assign  distinct  signatures  to  only 
entities  that  are  participating  in  an  epis(Kle  of  reasoning. 
Furtln'rinoie,  this  assignment  need  exist  only  for  the 
diinition  ol  a  reasoning  <-pisode.  One  can  interpret  the 
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relative  phase  in  which  a  node  is  firing  as  such  a  transient 
signature  of  the  node.  The  discussion  in  section  8.2  about 
working  memory  and  medium-term  memory  (also  sect. 
10)  suggests  how  an  augmented  system  that  includes 
medium-term  memory  may  engage  in  tasks  involving 
more  than  10  or  so  entities. 

Barnden  and  Srinivas  (1991)  have  proposed  Conposit,  a 
connectionist  production  system.  In  Conposit,  patterns 
are  associated  by  virtue  of  the  relative  position  of  registers 
containing  these  patterns,  as  well  as  the  similarity  be¬ 
tween  patterns.  Argument  bindings  are  propagated  by  a 
connectionist  interpreter  that  reads  the  contents  of  regis¬ 
ters  and  updates  them.  We  believe  that  Conposit  may  be 
an  appropriate  architecture  for  modeling  complex  reflec¬ 
tive  processes,  but  it  may  not  be  (rest  suited  for  modeling 
reflexive  reasoning. 

Another  solution  to  the  binding  problem  is  based  on 
frequency  modulation,  whereby  dynamic  bindings  may 
be  encoded  by  having  the  appropriate  nodes  fire  with  the 
same  frequency  (Tomabechi  £c  Kitano  1989). 


9.4.  Usfng  patttrm  tor  propagating  bindings 

An  important  aspect  of  the  proposed  reasoning  system  is 
the  organization  of  n-ary  rules  into  a  directed  graph, 
wherein  the  inferential  dependencies  between  anteced¬ 
ent  and  conseijuent  predicates  together  with  the  corre¬ 
spondence  between  the  predicate  arguments  are  repre¬ 
sented  explicitly.  As  we  have  seen,  this  encoding  in 
conjunction  with  the  temporal  representation  of  dynamic 
bindings  leads  to  an  efficient  reasoning  system.  But  the 
above  encoding  of  rules  is  significant  in  its  own  right.  One 
may  take  this  framework  for  organizing  rules  and  obtain 
other  organizationally  isomorphic  connectionist  systems 
by  using  alternative  techniques  (e.g.,  frequency  encod¬ 
ing)  for  representing  dynamic  bindings.  These  systems, 
however,  will  differ  in  the  size  of  the  resulting  network, 
constraints  on  the  nature  of  reasoning,  reasoning  speed, 
and  biological  plausibility.  To  illustrate  how  the  suggested 
organization  of  rules  and  arguments  may  be  combined 
with  alternate  techniques  for  propagating  dynamic  bind¬ 
ings,  we  use  the  proposed  encoding  of  rules  in  conjunc¬ 
tion  with  what  may  lie  referred  to  as  the  pattern - 
containment  approach. ^5 

In  the  pattern-containment  approach  we  assume  that 
each  argument  is  represented  by  a  cluster  of  n  nodes,  and 
inferential  links  between  arguments  are  represented  by 
connecting  the  nodes  in  the  associated  argument  clusters. 
An  n-dimensional  pattern  of  activity  is  associated  with 
each  concept  (i.e.,  an  instance  or  a  type),  and  a  dynamic 
binding  between  a  exmeept  and  an  argument  is  repre¬ 
sented  by  inducing  the  pattern  of  activation  associated 
with  the  concept  in  the  appropriate  argument  cluster. 
The  propagation  of  dynamic  bindings  in  the  system  occurs 
by  the  propagation  (replication)  of  patterns  of  activits 
along  connected  argument  clusters. 

It  is  instructive  to  compare  the  pattern-containment 
approach  with  the  temjKiral-synchrony  approach.  The 
key  rpiestion  is:  What  is  the  significance  of  the  pattern  ol 
activity  that  is  associated  with  a  ixmcept  and  propagated 
across  argument  clusters:'  One  possibility  is  that  eaeli 
ri-dimensional  pattern  encodes  the  signature  ass<K-iated 
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with  some  concept  (Lange  &  Dyer  1989).  As  we  pointed 
out  earlier,  the  value  of  n  would  depend  on  N,  the  number 
of  distinct  concepts  represented  in  the  system.  If  we 
assume  that  concepts  are  assigned  arbitrary  patterns  as 
signatures,  n  would  equal  log^.  Alternatively  the  pat¬ 
tern  of  activity  could  encode  all  the  microfeatures  of  a 
concept  (Hinton  1981;  Rumelhart  &  McClelland  1986). 
Such  a  pattern,  however,  would  have  to  be  even  larger. 
Both  of  these  interpretations  of  patterns  make  suboptimal 
use  of  computational  resources;  Each  argument  cluster 
has  to  be  large  enough  to  encode  the  full  signature  of  a 
concept  or  all  the  microfeatures  associated  with  a  con¬ 
cept.  Also,  individual  bindings  have  to  be  propagated  by 
propagating  large  patterns  of  activity.  An  attractive  alter¬ 
native  would  be  to  assume  that  the  patterns  associated 
with  concepts  during  the  propagation  of  bindings  are 
some  sort  of  “reduced  descriptions.”  We  suggest  that  the 
temporal-synchrony  approach  does  exactly  this  -  albeit  in 
an  unusual  manner.  During  the  propagation  of  bindings, 
the  relative  phase  of  Bring  of  an  active  concept  acts  as  a 
highly  reduced  description  of  that  concept. 

The  use  of  temporal  synchrony  enables  our  system 
to  do  with  one  node  and  one  link  what  the  pattern- 
containment  approach  does  using  n  nodes  and  links.  The 
temporal  approach  also  leads  to  a  simple  encoding  of  long¬ 
term  facts.  In  contrast,  the  realization  of  a  long-term  fact 
in  the  pattern-containment  approach  will  be  more  com¬ 
plex  since  it  must  support  mn-bit  comparisons  (where  m  is 
the  arity  of  the  fact  predicate)  to  check  whether  the 
dynamic  bindings  match  the  static  bindings  encoded  in 
the  fact.  In  section  7.3  we  suggested  that  single  (idealized) 
nodes  in  our  system  would  have  to  be  mapped  to  ensem¬ 
bles  of  nodes  and  single  (idealized)  links  would  have  to  be 
mapped  to  a  group  of  links.  This  mapping,  however,  was 
required  to  deal  with  noise  in  the  system  and  the  pattern- 
containment  approach  will  also  have  to  be  augmented  in 
order  to  deal  with  noise. 


10.  Discussion 

We  have  presented  a  neurally  plausible  model  for  knowl¬ 
edge  representation  and  reflexive  reasoning.  The  model 
supports  the  long-term  encoding  of  general  instantiation- 
independent  structures  as  well  as  specific  situations 
involving  n-ary  relations.  It  also  supports  the  representa¬ 
tion  of  dynamic  information  and  its  interaction  with  long¬ 
term  knowledge.  Everything  presented  in  this  target 
article,  except  for  the  treatment  of  soft  rules  (sect.  5.5), 
has  been  simulated.  The  proposed  model  makes  several 
specific  predictions  about  the  nature  of  reflexive  reason¬ 
ing  and  the  capacity  of  the  working  memory  underlying 
reflexive  reasoning.  These  predictions  are  verifiable  and 
we  hope  that  they  will  be  explored  by  experimental 
psychologists.  The  pro[x)sed  representational  mecha¬ 
nisms  are  (juite  general  and  shoidd  be  applicable  to  other 
problems  in  c-ognition  whose  formulation  re(juires  the 
expressive  p<Aver  of  n-ary'  predicates  aiul  whose  solution 
recpiires  rapid  and  systematic  interactions  between  long¬ 
term  and  dynamic  structures.  These  include  problems  in 
high-level  vision,  other  problems  in  langmige  pnKX’ssing 
sueh  as  syntactic  prcn'cssing.  and  reactive  planning.  Be- 
liAv  wc-  disiiiss  some  (problems  that  need  to  b<'  addr«‘ss«*<l 


if  the  representational  mechanisms  proposed  here  are  to 
be  applied  in  an  extended  setting. 

10.1.  Wlfn  do  phases  originets? 

In  a  sense,  the  “source"  of  rhythmic  activity  in  the  pro¬ 
posed  reasoning  system  is  clearly  identifiable:  The  proc¬ 
ess  that  poses  a  query  to  the  system  provides  staggered 
oscillatory  inputs  to  entities  mentioned  in  the  query  and 
thereby  activates  them  in  distinct  phases.  In  a  composite 
perceptual/linguistic/reasoning  system,  however,  such  a 
separation  in  the  phase  of  the  firing  of  distinct  entities 
must  occur  intrinsically.  For  example,  the  utterance  “John 
gave  Mary  Bookl"  should  automatically  result  in  the 
representations  of  “John,”  “Mary,”  and  “Bookl”  firing  in 
different  phases  and  synchronously  with  giver,  recipient, 
and  give-obj,  respectively. 

The  problems  of  automatic  phase  separation  and  conse¬ 
quent  segmentation  and  feature  binding  has  been  ad¬ 
dressed  by  several  researchers.  For  example,  Horn  et  al. 
(1991)  demonstrate  how  an  input  pattern  containing  a  red 
square  and  a  blue  circle  can  result  in  the  firing  of  nodes 
representing  the  features  “red”  and  “square”  in  one 
phase,  and  the  nodes  representing  the  features  “blue”  and 
“circle”  in  a  different  phase.  The  model,  however,  does 
not  work  if  there  are  more  than  two  objects.  An  internal 
attentional  mechanism  similar  to  the  “searchlight”  pro¬ 
posed  by  Crick  (1984)  may  be  required  for  dealing  with 
more  elaborate  situations. 

In  the  case  of  linguistic  input,  we  believe  that  the  initial 
phase  separation  in  the  firing  of  each  constituent  is  the 
outcome  of  the  parsing  process.  The  parser  module  ex¬ 
presses  the  result  of  the  parsing  process  -  primarily  the 
bindings  between  syntactic  arguments  and  constituents  - 
by  forcing  appropriate  nodes  to  fire  in  and  out  of  syn¬ 
chrony.  This  is  illustrated  in  a  parser  for  English,  de¬ 
signed  by  Henderson  (1991),  using  the  proposed  model 
for  reflexive  reasoning. 

10.2.  Who  reads  the  synchronous  firing  of  nodes? 

There  is  no  homunculus  in  our  system  that  “reads"  the 
synchronous  activity  to  detect  dynamic  bindings.  In¬ 
stead,  the  synchronous  activity  is  “read”  by  various  long¬ 
term  structures  in  the  system  that  do  so  by  simply 
detecting  coincidence  (or  the  lack  of  it)  among  their 
inputs.  For  example,  long-term  facts  read  the  rhythmic- 
activity  as  it  propagates  past  them  and  liccome  active 
whenever  the  dynamic  bindings  encoded  in  the  activity 
are  appropriate.  Similarly,  T-or  nodes  enforce  type  re¬ 
strictions  (e.g.,  the  node  a  in  Fig.  24)  by  enabling  the 
firing  of  a  rule  whenever  the  appropriate  argument  and 
type  nodes  are  firing  in-phase.  We  have  also  designed  a 
connectionist  mechanism  that  automatically  extrac-ts  an¬ 
swers  to  u'/i-<pieries  and  relays  them  to  an  output  dc-viee 
(McKendall  1991 ).  Wc-  associate-  a  c-ode  or  a  "name  '  with 
each  conec-pt.  This  name  has  no  internal  significance  and 
is  nu-ant  solely  for  commnnieating  with  the  systc-m’s 
environment.  The  mechanism  chanm-ls  the  nain<-s  of 
concc-pts  that  constitute-  an  answer  to  an  output  bnfVe-r  in 
an  inte-rle-ave-el  fashiem.  Feir  e-xainple-,  the-  patte-rns  for 
Balll  anel  Bookl  woulel  alte-rnate-  in  the- output  Imfle-r  after 
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the  tuh-query  own(Mary,  x)?  is  posed  with  reference  to 
the  network  in  Figure  12. 

10^.  How  an  ptuwaa  ncyclad? 

The  constraint  that  computations  must  involve  only  a 
small  number  of  entities  at  any  given  time  seems  reason¬ 
able  if  we  restrict  ourselves  to  a  single  episode  of  reason¬ 
ing,  understanding  a  few  sentences,  or  observing  a  simple 
scene.  But  what  happens  when  the  agent  is  participating 
in  a  dialogue  or  scanning  a  complex  scene  where  the  total 
number  of  significant  entities  exceeds  the  number  of 
distinct  phases  that  can  coexist.  In  such  situations  the  set 
of  entities  in  “focus”  must  keep  changing  constantly,  with 
entities  shilling  in  and  out  of  focus  in  a  dynamic  manner. 
Identifying  the  mechanisms  that  underlie  such  internal 
shifts  of  attention  and  cause  the  system’s  oscillatory  activ¬ 
ity  to  evolve  smoothly  so  that  new  entities  start  firing  in  a 
phase  while  entities  presently  firing  in  a  phase  gradually 
“release”  their  phase  remains  a  challenging  open  problem 
(but  see  Crick  &  Koch  1990a).  In  this  context  one  must 
also  note  that  the  notion  of  an  entity  is  itself  very  fluid.  In 
certain  situations,  John  may  be  an  appropriate  entity.  In 
other  situations,  John’s  face  or  perhaps  even  John’s  nose 
may  be  the  appropriate  entity. 

'Ihe  notion  of  the  release  of  phases  has  a  natural 
interpretation  in  the  parsing  system  described  by  Hen¬ 
derson  (1992).  The  parser  is  incremental  and  its  output  is  a 
sequence  of  derivation  steps  that  leads  to  the  parse.  The 
entities  in  the  parser  are  nonterminals  of  the  grammar, 
and  hence  each  active  nonterminal  must  fire  in  a  distinct 
phase.  Under  appropriate  conditions  during  the  parsing 
process  -  for  example,  when  a  nonterminal  ceases  to  be 
on  the  right  frontier  of  the  phrase  structure  -  the  phase 
associated  with  a  nonterminal  can  be  “released  ”  and, 
hence,  become  available  for  nonterminals  introduced  by 
subsequent  words  in  the  input.  This  allows  the  parser  to 
recover  the  structure  of  arbitrary  long  sentences  as  long  as 
the  dynamic  state  required  to  parse  the  sentence  does  not 
exceed  the  bounds  on  the  number  of  phases  and  the 
number  of  instantiations  per  predicate. 

10.4.  Genarallzing  the  use  of  synchronous  oscHlaUons 

Thus  far  we  have  assumed  that  the  scope  of  phase  distri¬ 
bution  is  the  entire  system.  We  must,  however,  consider 
the  possibility  where  the  system  is  composed  of  several 
modules  (say  the  perceptual,  linguistic,  or  reasoning  mod¬ 
ules).  If  we  combine  the  requirements  of all  these  modules 
it  becomes  obvious  that  ten  or  so  phases  will  be  inade¬ 
quate  for  representing  all  the  entities  that  must  remain 
active  at  any  given  time.  Thus  a  temporal  coding  of 
dynamic  bindings  is  not  viable  if  a  single  phase  distribu¬ 
tion  must  extend  across  all  the  modules.  Therefore  it 
becomes  crucial  that  each  module  has  its  own  phase 
distribution  so  that  each  module  may  maintain  bindings 
involving  ten  or  so  entities.  This,  however,  poses  a  prob¬ 
lem;  How  should  modules  communicate  with  each  other 
in  a  consistent  manner?  Consider  a  system  whose  visual 
module  is  seeing  “John"  and  whose  conceptual  module  is 
thinking  something  about  John.  How  sliould  the  visual 
and  conceptual  iikkIuIcs  share  information  alKiiit  John 
even  though  the  pha.s<*  and  fre(inenc\'  of  the  ikkIcs  encotl- 
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ing  John  in  the  two  systems  may  be  different?  Aaronson 
(1991)  describes  a  connectionist  interfile  that  allows  two 
phase-based  modules,  each  with  its  own  phase  structure, 
to  exchange  binding  information. 

f  0.5.  hUemwiiing  facts:  Converting  dynamic  bindings  to 
static  patterns 

In  the  proposed  system,  dynamic  information  is  repre¬ 
sented  as  transient  rhythmic  activity  and  long-term  mem¬ 
ory  is  encoded  by  using  “hard-wired”  interconnections 
between  nodes.  We  have  not  discussed  how  appropriate 
dynamic  information  may  be  converted  into,  and  re¬ 
corded  as,  synaptically  encoded  long-term  structures.  A 
specific  problem  concerns  the  conversion  of  dynamic 
bindings  corresponding  to  a  novel  (but  salient)  fact  into  a 
medium-term  fact  by  converting  the  set  of  dynamic  bind¬ 
ings  into  a  set  of  static  bindings  that  last  longer  than  a  few 
hundred  milliseconds  (perhaps  even  days  or  weeks).  This 
problem  has  been  addressed  in  Ceib  (1990)  by  using 
recruitment  learning  (Feldman  1982;  Shastri  1988a; 
Wickelgren  1979)  in  conjunction  with  a  fast  weight- 
change  process  abstractly  modeled  after  long-term  po¬ 
tentiation  (Lynch  1986).  ”1116  proposed  solution  allows  a 
one-shot  conversion  of  dynamic  facts  into  a  structurally 
encoded  fact  in  the  presence  of  a  “learn”  signal.  It  is 
envisaged  that  subsequently,  such  medium-term  struc¬ 
tures  can  be  converted  into  long-term  structures  by  other 
processes  (Marr  1971;  Squire  1987;  Squire  &  Zola- 
Morgan  1991).  The  notion  of  fast  synapses  proposed  by 
von  der  Malsburg  (1981)  may  also  play  an  intermediate 
role  in  sustaining  memories  that  must  last  beyond  a  few 
hundred  milliseconds. 

fO.6.  Learning  rules 

The  problem  of  learning  the  representation  of  rules  in  a 
system  that  uses  a  temporal  representation  is  no  more 
difficult  than  the  problem  of  learning  structured  repre¬ 
sentation  in  connectionist  networks.  Instead  of  being 
triggered  by  “simple”  coactivation,  learning  must  now  be 
triggered  by  synchronous  activation.  Recently,  Mozer  et 
al.  (1991)  have  demonstrated  how  backpropagation  style 
learning  may  be  generalized  to  networks  of  nodes  that  are 
essentially  like  p-btu  nodes.  We  are  addressing  the  prob¬ 
lem  of  learning  in  the  concept  of  preexisting  predicates 
and  concepts  where  it  is  desired  that  the  cooccurrence  of 
events  should  lead  to  the  formation  of  appropriate  con¬ 
nections  between  predicate  arguments.  A  special  case 
involves  assuming  generic  interconnections  between 
predicate  arguments,  and  viewing  rule  learning  as  learn¬ 
ing  the  correct  type  restrictions/preferences  on  argument 
fillers.  This  may  be  achieved  by  modifying  weights  on 
links  lietween  tbe  type  hierarchy  and  the  rule  component 
(see  sects.  5.4  &  5.5). 
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NOTES 

1.  For  example,  see  Allen  1987;  Bobrow  &  Collins  1975; 
Chamiak  1976;  Corriveau  1991;  Dyer  1983;  Fahiman  1979;  just 
&  Carpenter  1977;  Kintsch  1974;  Lehnert  6c  Ringlc  1982; 
Norvig  1989;  Schank  6c  Abclson  1977;  Wilensky  1983. 

2.  That  reflexive  reasoning  occurs  spontaneously  and  without 
conscious  effort  does  not  imply  that  the  agent  cannot  liccome 
aware  and  conscious  of  the  result  of  such  reasoning,  as  in  an 
agent's  yes  response  to  the  question  Does  John  own  a  car?  given 
“John  bought  a  Rolls-Royce."  In  many  situations,  however,  the 
result  of  reflexive  reasoning  may  only  be  manifest  in  the  mental 
state  of  the  agent.  For  example,  during  reading  the  effect  of  such 
reasoning  may  be  manifest  primarily  in  the  agent’s  sense  of 
understanding,  coherence  (or  lack  thereof),  disbelief,  humor, 
and  so  forth. 

3.  The  reflexive/ reflective  distinction  we  make  here  in  the 
context  of  reasoning  shares  a  number  of  features  with  the 
automatic/controlled  distinction  proposed  by  Schneider  and 
Shiifrin  (Schneider  6c  Shiffrin  1977;  Shiffrin  6c  Schneider  1977; 
see  also  Posner  6c  Snyder  1975).  Like  automatic  processing, 
reflexive  reasoning  is  parallel,  fast,  exxurs  spontaneously,  and 
the  agent  is  unaware  of  the  reasoning  process  per  se.  However, 
the  working  memory  underlying  reflexive  reasoning  has  specific 
capacity  limitations  (see  sect.  8.2).  In  formulating  the  problem  of 
reflexive  reasoning  and  developing  a  detailed  computational 
model  for  it,  we  have  generalized  the  notion  of  automatic 
processing  by  bringing  into  its  fold  the  more  conceptual  task  of 
systematic  reasoning. 

4.  If  we  assume  that  information  is  encoded  in  the  firing  rate 
of  a  neuron  then  the  amount  of  information  that  can  be  conveyed 
in  a  “message"  would  depend  on  AF,  the  range  over  which  the 
firing  frequency  of  a  presynaptic  neuron  can  vary,  and  AT,  the 
window  of  time  over  which  a  postsynaptic  neuron  can  “sample" 
the  incident  spike  train.  AT  is  essentially  how  long  a  neuron  can 
“remember”  a  spike  and  depends  on  the  time  course  of  the 
postsynaptic  potential  and  the  ensuing  changes  in  the  mem¬ 
brane  potential  of  the  postsynaptic  neuron.  A  plausible  value  of 
AF  may  be  about  200.  This  means  that  in  order  to  decode  a 
message  containing  2  bits  of  information,  AT  has  to  be  about  15 
msec,  and  to  decode  a  3-bif  message,  it  must  be  about  35  msec. 
One  could  argue  that  neurons  may  be  capable  of  communicating 
more  complex  messages  by  using  variations  in  interspike  delays 
to  encode  information  (see,  e.g.,  Strehler  6c  Lestienne  1986). 
However,  Thorpe  and  Imbert  (1989)  have  argued  that  in  the 
context  of  rapid  processing,  the  firing  rate  of  neurons  relative  to 
their  available  time  to  respond  to  their  inputs  implies  that  a 
presynaptic  neuron  can  only  communicate  one  or  two  spikes  to  a 
postsynaptic  neuron  before  the  latter  must  produce  an  output. 
Thus  the  information  communicated  in  a  message  remains 
limited  even  if  interspike  delays  are  used  as  temporal  codes. 
This  docs  not  imply  that  networks  of  neurons  cannot  represent 
and  process  complex  structures.  Clearly  they  can.  The  interest¬ 
ing  question  is  how. 

5.  This  ob.servation  d<K-s  not  presuppose  any  particular  en¬ 
coding  scheme  and  applies  to  liK'alist  and  distributed,  as  well  as 
hybrid,  schemes  of  representation.  Tlie  iwint  is  purely  numeri¬ 
cal  -  an\  enccKling  scheme  that  reciuires  n^  ncxlcs  to  represent 
an  LTKB  of  size  n  will  retiuire  10“’  nodes  to  repre.sent  an  LTKB 
of  size  10”. 

6.  This  hypothesis  does  not  conflict  with  the  fan  effect  (An¬ 
derson  19<S.3;  see  also  .se<  f.  S.OV 

7.  The  rules  used  in  this  and  other  examples  ;ire  only  meant 
to  illustnite  the  dyn;unic-l)in(ling  problem  and  are  not  intended 
(o  1h‘  ;i  (lel;iile<l  cliaractei  izatiou  of  conunonsense  knowletlge. 


For  example,  the  rule  relating  "giving"  and  "owning"  is  an 
oversimplification  and  does  not  capture  the  richness  and  com¬ 
plexity  of  the  actual  notions  of  giving  and  owning. 

8.  Although  systematicity  has  broader  connotations  (e.  g. ,  sec 
Fodor  6t  Pylyshyn  1988a),  we  use  it  here  to  refer  speciik^ly  to 
the  correspondence  between  predicate  arguments  stipulated  by 
rules. 

9.  The  symbol  V  is  the  universal  quantifier,  which  may 
formally  be  interpreted  to  mean  “for  all,”  and  the  symbol  ^  is 
the  logical  connective  “implies.”  Thus  the  statement  Vu.c 
lhuy(u,v)  =>  own(u,v)]  asserts  that  for  any  assignment  of  values 
to  u  and  v,  if  u  buys  v  then  u  owns  v. 

10.  A  similar  formation  of  "static”  bindings  occurs  in  any 
learning  network  with  hidden  nodes.  Observe  that  a  hidden 
node  at  level  /  learns  to  respond  systematically  to  the  activity  of 
nodes  at  levels  /  -  1  and  below,  and  in  so  doing  the  network 
teams  new  bindings  between  representations  at  level  I  and 
/  —  1.  These  bindings,  however,  are  static,  and  the  time  it  takes 
for  them  to  get  established  is  many  orders  of  magnitude  greater 
than  the  time  within  which  dynamic  bindings  must  be 
established. 

11.  Feature  binding  can  be  achieved  by  creating  sets  of 
features  such  that  those  belonging  to  the  same  entity  are  placed 
in  the  same  set.  In  terms  of  expressive  power,  unary  predicates 
suffice  to  solve  this  problem.  For  example,  the  grouping  of 
features  belonging  to  a  “red  smooth  square”  and  a  “blue  dotted 
circle"  can  be  expressed  by  using  unary  predicates  such  as 
redfobjl)  A  smooth(objl)  A  squareiobjl)  and  blue(obj2)  A 
dotted(obj2)  A  circl^obj2). 

12.  We  first  described  our  proposed  model  in  1990  (Shastri  & 
Ajjanagadde  1990).  An  earlier  version  using  a  central  clock  was 
reported  in  Ajjanagadde  and  Shastri  (1989). 

13.  As  stated  in  Note  1 1 ,  unary  predicates  suffice  to  solve  the 
feature-binding  problem  and  the  expressive  power  of  the 
models  cited  above  is  limited  to  unary-predicates  (see  Hummel 
6t  Bicderman  1991).  The  greater  expressive  power  provided  by 
n-ary  predicates  would  eventually  be  required  by  more  sophisti¬ 
cated  models  of  visual  processing. 

14.  There  are  other  variants  of  marker  passing  (see,  e.g., 
Chamiak  1983;  Hendler  1987;  Hirst  1987;  Norvig  1989)  where 
“markers”  are  even  more  complex  messages  containing  a  marker 
bit,  a  strength  measure,  backpointers  to  the  original  and  imme¬ 
diate  source  of  the  marker,  and  sometimes  a  flag  that  indicates 
which  types  of  links  the  marker  will  propagate  along.  The 
marker-passing  system  has  to  process  the  information  contained 
in  markers,  extract  paths  traced  by  markers,  and  evaluate  the 
relevance  of  these  paths.  In  view  of  this,  such  marker-passing 
systems  arc  not  relevant  to  our  discussion. 

15.  We  can  generalize  the  behavior  of  a  p-btu  node  to  account 
for  weighted  links  by  assuming  that  a  node  will  fire  if  and  only  if 
the  weighted  sum  of  synchronous  inputs  is  greater  than  or  equal 
to  n  (see  sects.  5.5  6c  8.1). 

16.  In  the  idealized  model  each  argument  is  encoded  as  a 
single  p-btu  node  and,  hence,  it  is  reasonable  to  assume  that  a 
node  may  fire  in  response  to  a  single  input.  The  thresholds  of 
nodes  in  the  ensemble-based  mcxlel  will  be  higher  and  will 
depend  on  the  average  interenseinhic  connections  per  node. 

17.  A  constant  refers  to  a  specific  entity  in  the  domain,  the 
symbol  3  is  the  existential  quantifier,  which  may  be  interjjrcted 
to  mean  “there  exists.  ”  Recall  that  the  symbol  V  is  the  universal 
quantifier,  which  may  be  inteqrretcd  to  mean  “for  all."  Thus  the 
statement  Vx  li>erson(x)  3c  motheriz,x)]  asserts  that  for 
C’very  person  .r  there  exists  some  c  such  that  c  is  the  mother  of  x. 
The  symirol  A  is  the  logical  couuectixe  "and.  " 

18.  The  system  can  enciKle  first-order,  function-free  Horn 
Clauses  with  the  added  restric  tion  that  any  variable  cK-curring  in 
nudtijrle  argument  positions  in  the  antecedent  of  a  rule  must 
also  appear  in  tlu“  coiise(|nenl.  Horn  ('lanses  form  the  basis  ol 
I’HOI.oc;,  a  programming  langimge  used  extensixc’iy  in  ;irtilkial 
intcdligc-nce  (see,  c'.g.,  Ccneserelh  6c  Nilsson  19ST). 
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19.  This  time  consists  of  (1)  ht,  the  time  taken  by  the 
activation  originating  at  the  enabler  of  the  query  predicate  to 
reach  the  enabler  of  the  predicate(s)  that  are  relevant  to  the 
derivation  of  the  query,  (2)  ir,  the  time  taken  by  the  relevant 
&ct(s)  to  become  active,  (3)  ir,  the  time  taken  by  the  active  fact(s) 
to  activate  the  relevant  collectorfs),  and  (4)  lir,  the  time  taken  by 
the  activation  to  travel  from  the  collectors  of  the  relevant 
predicate(s)  to  the  collector  of  the  query  predicate. 

20.  The  closed-world  assumption  simply  means  that  any  lk.'t 
F  that  is  neither  in  the  knowledge  base  nor  deducibic  from  the 
knowledge  base  may  be  assumed  to  be  false. 

21.  Here  we  arc  using  concept  to  refer  only  to  the  entities  and 
types  enc-oded  in  the  hierarchy.  This  is  not  to  suggest  that 
predicates  such  as  give  and  owsi  that  are  not  represented  in  the 
IS-A  hierarchy  are  not  c-oncepts  in  the  broader  sense  of  the 
word. 

22.  In  our  formulation,  each  IS-A  link  is  strict  and  only 
property  values  are  exceptional.  This  approach  for  dealing  with 
exceptional  and  defeasible  information  in  IS-A  hierarchies  is 
explained  in  Shastri  (1988a). 

23.  This  is  required  because  a  fact  is  true  of  some  entity  of 
type  C  if  one  or  more  of  the  following  holds:  (1)  Tlie  fact  is 
universally  true  of  a  superconcept  of  C,  (2)  the  fact  is  true  of 
some  subconcept/instance  ofC,  or  (3)  the  fact  is  universally  true 
of  a  superconcept  of  a  subconcept/instance  of  C.  The  last  is 
required  if  concepts  in  the  IS-A  hierarchy  can  have  multiple 
parents. 

24.  These  times  are  approximate  because  the  time  required 

for  propagation  along  the  IS-A  hierarchy  and  the  rules  may 
overlap  and,  hence,  the  actual  time  may  be  less.  For  example, 
the  time  to  perform  a  predictive  inference  may  also  only  be 
max(l,Tr,  It  is  also  possible  for  the  actual  time  to  be 

greater,  because  in  the  worst  case  it  may  take  up  to  eight  cycles 
instead  of  three  to  traverse  an  IS-A  link. 

25.  The  number  of  antecedent  predicates  (m)  in  a  rule  can 
also  be  reduced  by  introducing  ancillary  predicates.  For  exam¬ 
ple,  the  rule  V.v,i/,3  P(x,  y,  z)AQ(x,  y,  z)Ar(x,  y,  z)^S(x,  y,  z) 
may  be  replaced  by  two  rules,  each  of  which  has  only  two  ante¬ 
cedent  predicates;  'ix,y,z  P(x,  y,  z)AQ(x,  y,  z)^Sl(x,  y,  z)and 
Vi,!/,2  Sl(x,  y,  z)  A  R(x,  y,  z)  ^  S(x,  y,  z).  The  benefit  of  reducing 
m  in  this  manner  has  to  be  weighed  against  the  cost  of  introduc¬ 
ing  an  additional  predicate  in  the  system.  But  the  savings 
outweigh  the  costs  if  such  a  predicate  helps  in  reducing  the  m 
value  of  several  rules. 

26.  The  reasoning  system  uses  the  phase  of  activation  to 
encode  binding  information.  Hence,  in  principle  the  amplitude 
of  activation  could  be  used  to  represent  the  "strength"  of 
dynamic  bindings  and  rule  firings.  Note  however,  that  the 
amplitude  of  a  node’s  output  is  encoded  by  the  spiking  frequency 
and  the  use  of  varying  frequency  to  encx)de  rule  strengths  will 
interfere  with  the  encoding  of  dynamic  bindings. 

27.  While  the  occurrence  of  synchronous  activity  is  less 
controversial,  the  occurrence  of  synchronized  oscillations  in  the 
animal  brain  and  its  representational  significance  is  still  a  matter 
of  controversy.  More  evidence  is  needed  to  establish  firmly  the 
role  of  oscillatory  activity  in  neural  information  processing. 
Some  researchers  have  reported  difficulty  in  demonstrating 
oscillatory  activity  in  the  primate  visual  system  using  static 
stimuli  (e.g..  Rolls  1991;  Tovec  &  Rolls  1992).  In  this  context, 
however,  it  must  be  recognized  that  a  very  small  fraction  of 
neurons  would  be  expected  to  participate  in  an  episode  of 
synchronous  activity.  Furthermore,  the  grouping  of  neurons 
would  be  dsnamic  and  vary  considerably  from  one  episode  of 
reasoning  toanother.  Hence,  s>  nchronous  o.scillations  would  be 
very  difiicult  to  detect, 

28.  .'K  more  detailed  model  of  such  coupling  has  since  been 
developed  (Mandeli)auin  1991). 

29.  These  timings  were  obtained  by  analyzing  the  simula¬ 
tions  ot  the  reflexive-reasoning  system  carried  out  using  a 
simulation  s\  stem  dexeloped  b\  Maui  (1991).  The  simulation 


system  is  implemented  using  hcs  -  the  Rochester  Connection- 
ist  Simulator  (Nigel  et  al.  1989). 

30.  The  above  behavior  generalizes  the  notion  of  a  "strength"  - 
associated  with  concepts  (cf.  Anderson  1983)  and  extends  it  to 
rules.  IS-A  relations,  facts,  and  even  individual  static  bindings 
in  the  LTKB. 

31.  The  cost  of  realizing  multiple  instantiation  of  concepts  is 

considerably  lower  than  that  of  realizing  the  multiple  instantia¬ 
tion  of  predicates.  Thus  the  value  of  k/  can  be  hitter  than  three. 
Observe  however,  that  k,  need  be  no  more  than  . 

32.  There  are  several  ways  of  encoding  the  relevant  kinship 
knowledge.  All  these  pose  the  same  problem,  however;  The 
antecedent  of  one  of  the  rules  contains  a  repeated  variable  that 
dues  not  occur  in  the  consequent.  One  possible  encoding  of  the 
relevant  knowledge  is  given  below  (note  that  Self  refers  to  the 
agent  and  the  rest  of  the  names  have  been  chosen  arbitrarily  to 
complete  the  example).  The  long-term  facts  are  grand- 
fatheriCeorge,  Self),  tnother(Susan,  Self),  aa\d  father  (George, 
Susan).  The  rule  is  ^x,y,z  grandfather(x,y)  A  father{x,z)  A 
mother(z,y)  maternal  grandfather(x,y). 

33.  In  addition  to  the  constraints  on  the  WMRR,  the  number 
of  dynamic  facts  that  can  be  communicated  to  an  agent  at  one 
time  will  be  bounded  by  the  rather  limited  capacity  of  the  overt 
short-term  memory. 

34.  Ullman  and  van  Celder  (1988)  treat  the  number  of  nodes 
required  to  encode  the  LTKB  as  a  fixed  cost;  hence  they  do  not 
refer  to  its  size  in  computing  the  space  complexity  of  their 
system.  If  the  size  of  the  LTKB  is  taken  into  account,  the 
number  of  processors  required  by  their  system  turns  out  to  be  a 
high-degree  polynomial. 

35.  The  relation  between  our  approach  and  the  pattern- 
containment  approach  was  pointed  out  by  Geolf  Hinton  (per¬ 
sonal  communication). 


Open  Peer  Commentary 


Cmmnentary  submitted  by  the  qualified  professional  readership  of  this 
journal  will  be  considered  for  publication  in  a  later  issue  as  Continuing 
Commentary  on  this  article.  Integrative  overviews  and  syntheses  are 
especially  encouraged. 


Time  phases,  pointers,  rules  and  embedding 

John  A.  Barnden 

Computing  Research  Laboratory  &  Computer  Science  Department,  New 
Mexico  State  University.  Las  Cruces,  NM  68003-0001 
Etactronic  mall:  ibamden(a  nmsu.edu 

Binding  by  time  phases  is  an  interesting  special  case  of  the 
following  very  general  (temporary)  binding  method:  To  bind  two 
things,  mark  them  in  roughly  the  same  way.  Let’s  call  this  the 
similar-mark  approach.  Note  that  it  could  apply  to  nonconnec- 
tionist  as  well  as  conncctionist  systems.  In  Shastri  &  Aj- 
janagadde’s  (S&A’s)  case,  we  may  take  the  marks  to  be  the 
oscillatory  patterns  of  excitation  atxjuired  by  argument  nodes 
and  so  on.  Two  marks  are  similar  enough  to  constitute  a  binding 
if  they  have  sufficiently  similar  phases  (and  fre(iuencies).  So, 
S&A’s  incth<xl  is  a  special  case  ol  the  approach  of  temix)rarily 
binding  cxmnectionist  nodes  or  subnetworks  together  by  dy¬ 
namically  making  tlu-m  hold  actixation  patterns  that  are  similar 
enough  in  some  specific  s<’nsc.  That  is.  the  time-phase  methiKl 
is  a  special  case  of ‘‘i);ittcni-siinilarity  association’  or  RS.^  (Barn- 
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den  6t  Srinivus  1991),  Tliis  is  in  turn  aconncclionist  siK'cial  ca.se 
<if  the  siinilar-inark  approach. 

A  iK-nefit  oftsmsidcring  the  tiine-pliase  method  in  the  context 
of  PSA  and  similar-mark  binding  in  general  is  that  wc  see  the 
close  relationship  to  the  technique  of  “associative  addressing" 
widely  used  in  speciali/ed  computer  hardware  as  an  alternative 
to  [xtinters.  With  this  technique  two  memory  areas  can  1m- 
temjMrrarily  “linked"  together  by  phicing  identical  or  sufficiently 
similar  bit-strings  somewhere  within  them.  Such  a  bit-string 
extiacted  from  one  place  can  be  used  to  find  the  other  pl;Kx*  or 
places  that  contain  that  bit-string  or  suitably  similar  ones.  In 
sum,  S&A’s  system,  which  is  one  of  the  few  connectionist 
systems  that  can  actually  iterform  inferi-ncing  of  any  respectable 
complexity,  turns  out  to  rest  on  a  binding  scheme  (juite  strongly 
related  to  a  conventional  cxnnputer  teclmu|ue,  but  without  using 
any  close  analogue  of  ijointers. 

We  are  iH-ing  h-d  here  to  the  question  of  wliat  happens  to  the 
notion  of  a  jK)inter  when  we  move  away  from  computers.  This 
(|uestion  is  examined  to  some  extent  by  Barnden  and  Srinivas 
(1991),  One  can  define  a  |x>inter  in  a  connectionist  system  to  1m- a 
temporary  system  substate  (c.g.,  activation  pattern)  that  identi¬ 
fies  a  permanently  existing  place  in  the  system.  However, 
without  specific  architectural  assumptions  we  cannot  say  what  a 
“place"  is,  other  than  by  resting  on  the  excessively  restrictive 
option  of  a  place  just  Ixiing  a  single  node  or  on  the  excessively 
loose  option  of  a  place  lieing  any  subset  of  the  system’s  nodes. 
Going  back  to  S&A,  if  the  phase  assigned  to  the  John  node  or 
assembly,  say,  were  fixed  for  all  time,  then  phases  ixndd  be 
regarded  as  ixtinters,  because  they  would  permanently  identify 
such  nodes  or  assemblies.  However,  SfltA  allow  phases  to  1m- 
dynamically  assigned,  so  they  are  probably  radically  different 
from  “jKjinters”  under  any  usefully  narrow  construal  of  that 
word.  The  signature  scheme  in  HOBIN  (Lange  &  Dyer  1989)  is 
more  pointerlike,  because  signatures  are  statically  assigned. 

One  major  benefit  of  similar-mark  techniques  is  that  they 
allow  bidirectional  binding  in  two  senses.  (1)  A  binding  cxnild  1h- 
conceptually  bidirectional;  one  might,  for  instance,  say  that  if 
several  S&A  argument  nodes  have  the  same  phase  they  are 
bidirectionally  bound  to  each  other.  (2)  A  binding,  though 
perhaps  c-onceptually  unidirectional,  tx)uld  be  used  bidirec¬ 
tionally.  For  instance,  a  node  oscillating  at  a  certain  phase  miglit 
broadcast  its  oscillation  to  other  nodes,  thereby  causing  similar- 
phased  nodes  to  light  up  in  some  special  way,  but  the  same  thing 
could  be  done  starting  at  any  of  those  nodes  (sec  also  Touretzky 
1990).  By  contrast,  computer  pointers  can  only  efficiently  Jm.- 
used  in  one  direction. 

A  disadvantage  of  many  similar-mark  schemes,  however,  is 
that  if  a  binding  is  c-onccptually  unidirectional,  one  needs 
something  extra  to  specify  direction.  Tliat  something  could  Im? 
higlily  implicit  in  the  overall  system  architecture;  thus,  the 
binding  Ixitween  an  S&A  argument  node  and  the  John  node  is, 
arguably,  conceptually  unidirectional,  and  that  fact  is  implicitly 
respected  in  the  whole  way  that  the  system  operates.  However, 
if  one  needed  unidirectional  bindings  between  argument  nodes 
for  some  reason,  one  would  need  to  do  something  more  than 
simply  give  them  the  same  phase. 

A  concern  1  have  about  many  tx)nnectionisf  systems,  includ¬ 
ing  S&A’s,  is  that  they  may  face  difficulty  in  encompassing 
certain  important  types  of  reasoning,  including  some  reflexive- 
types.  S&A  claim  it  is  unlikely  that  the  input  propositions  to 
reflexive  reasoning  episodes  can  1m?  dynamically  arising  rules.  I 
take  the  point  of  their  syllogism  example,  but  there  are  more 
mundane  examples  that  are  not  so  easily  dispersed  of.  For 
instance,  supjM)se  someone  says,  “All  tile  people  at  the  party 
were  toothbrush  salespersons.  Some  of  them  even  bad  their 
sample  eases  with  them.  "The  obvious  infen-nce  that  those  cases 
contained  toothbrushes  seems  no  less  a  candidate  for  being 
dubbed  “reflexive"  than  do  the  inferences  in  S&A’s  Little  Rerl 
BKling  HimkI  and  Colombian  drug  (-nforcement  agem-y  exam- 
pl<-.  Y<-t  one  of  the  input  propositions  is  tlu-  universally  <|uan- 


lified  one  alMuil  all  the  |M.-uplc  in  the  riHiin,  and  this  can  Im- 
view(-d  as  a  dynamic-ally  arising  rule. 

Now  supiMise  sumeone  says,  Tom  thought  that  the  milk  w.is 
sour.  He  went  out  to  buy  some  more.  ”  One  needs  to  Im-  able  to 
apply  the  general  knowledge  that  sour  milk  tends  to  be  unusable 
for  certain  puqMises,  together  with  Tom’s  re|X>rted  tluniglit,  in 
order  to  understand  Tom's  motive  in  going  out  fur  mure  milk.  Is 
such  rea.suning  not  reflexive'!*  Tliat  is,  1  am  suggesting  that  input 
pruiMisitiuns  and  reasoning  episodes  involving  them  can  lx- 
emlM-dded  in  propositional  attitude  c-ontexts  (among  other  sorts 
of  context,  such  as  counterfactual  ones)  without  making  the 
rea.soning  nonreflexivi-.  This  makes  the  task  of  eonneetion- 
istically  implementing  reflexive  reasoning  yet  more  complex. 

KmiM-dding  and  dynamically  arising  rules  ari-  discusst-d  fur¬ 
ther  in  Baniden  (1992,  pp.  149-78).  Bc-cause  very'  lew  work»-i  s 
in  cunnectiunism,  or  indeed  critic-s  of connectionism,  have  even 
l>aid  lip  service  to  the  issut-s,  my  txnnnu-nts  are  hardly  a  strikr- 
against  S&A  s[H-cifically. 


Plausible  inference  and  implicit 
representation 

Malcolm  I.  Bauer 

Department  ot  Psychology,  Princeton  University,  Princeton,  NJ  08544-1010 
Elactronic  mall:  malcolm(a  clatity.ptinceton.edu 

Shastri  &  Ajjanagadde’s  (S&A’s)  distinction  betwc-c-n  reflexivr- 
and  reflective  reasoning  is  similar  to  the  distinction  iM-txx’cen 
implicit  and  explicit  reasoning  made  by  Johnson-Laird  (198'J). 
Implicit  reasoning  is  rapid,  effortless,  and  occurs  outside  exm- 
scious  awareness.  It  is  basically  model  building  without  the 
deliberate  search  for  alternatives.  Explicit  reasoning  re<juir<-.s 
deliberate,  conscious  effort  and  calls  for  the  search  for  alterna¬ 
tive  models  that  may  invalidate  an  inference.  S&A’s  work  is 
hence  an  attempt  to  create  a  detailed  account  of  implicit  rea.son- 
ing.  They  accordingly  write  that  their  system  “simulates  the 
behavior  of  the  external  world  and  dynamically  creates  a  vivid 
model  of  the  state  of  affairs  resulting  from  the  given  situation" 
(sect.  3.4).  This  is  an  imiJortant  and  innovalix  e  area  of  research 
but  there  arc  some  weaknesses  in  their  theory. 

First,  S&A  avoid  the  important  question  of  how  people  maki- 
a  limited  number  of  plausible  inferences  from  the  vast  set  of 
possible  inferences.  From  any  premise,  there  follow  an  infinitc 
number  of  valid  deductions  and  inductive  hypotheses  (for  exam¬ 
ple,  continually  conjoining  the  premise  with  itself  generates  a 
countably  infinite  set  of  valid  deductions).  Tlie  mechanisms  by 
which  people  reason  must  greatly  constrain  the  infereni-t-s  tlu-y 
draw.  For  example,  Johnson-Laird  (1988)  proiX)ses  that  wIm-ii 
people  make  deductive  inferences  they  draw  ct)nclusions  that 
maintain  the  semantic  information  in  the  premises,  and  wlu-n 
they  make  inductive  generalizations  they  draw  conclusions  that 
increase  semantic  information.  S&A  avoid  the  issue  by  hami 
coding  only  those  inference  patterns  they  judge  to  Im-  plausible. 
Given  the  premise  “John  gave  .Mary  iKMjkl”  .S&.A  dc-cide  that 
“Mary  trwns  liookl  ”  and  “Mary  can  sell  lxM)kr'  are  two  plausible 
inferences.  However,  the  constraints  on  which  inferences  arc- 
drawn  are  not  part  of  their  theory,  but  arc-  in  the  heads  of  the 
researchers. 

,Sc-cx)nd,  S&A’s  notion  of  "model’  is  problematic.  MckIcIs 
capture-  the  structure  of  the  world.  Dm-  can  pc-rlorm  analogous 
opc-rations  on  mcKlels  and  make  infc-rc-nces  about  the  slate-  ol  the 
world  from  the  new  state  of  the  nuMlel,  An  im|)ortanl  pro|M-rty  ol 
hichIc-Is  is  their  ability  to  represent  information  implic  itly.  In  so 
doing,  mcxh-ls  can  represc-nt  whole  classc-s  of  inh-rc-nc  c-s  that  c  an 
lx-  made  explicit,  if  nc-eded,  with  further  rc-asoning.  This  kind  ol 
rc-prc-sc-nlation  is  cpiitc-  different  from  enccxling  sc  Ic-c  ti-cl  plausi¬ 
ble  infc-rc-nce  i)attc-rns  as  in  S&.A's  system  S&.A  impiv  that  a 
mocic-l  is  a  finite  collc-ction  ol  plausible  inlc  rc-nc  c-s  tli.it  ni.is  be 
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drawn  about  a  situation.  When  a  situation  is  partially  d. ^scribed, 
the  system  constructs  a  chain  of  inferences.  For  example,  from 
the  premise  “John  drove  from  his  house  to  the  store"  some 
plausible  inferences  that  follow  are,  “John  left  his  house"  and 
“John  arrived  at  the  store.”  Equally  plausible,  though,  are 
inferences  such  as  “John  drove  at  least  halfway  to  the  store," 
“John  drove  at  least  j  of  the  way  to  the  store, "  and  “John  drove  at 
least  |th  of  the  way  to  the  store,  ”  and  so  on.  In  S&A’s  system,  a 
complete  model  of  the  situation  would  consist  of  an  explicit 
representation  of  all  of  the  alxjve  inferences  and  an  infinity  of 
other  inferences  that  follow.  [See  also  BBS  multiple  book  review 
of  Sperber  &  Wilson's  Relevance:  Communication  and  Cogni¬ 
tion,  BBS  10(4)1987.]  Although  it  is  clear  that  people  can 
determine  the  validity  of  these  inferences,  it  is  unlikely  that 
they  construct  explicit  representations  for  each  of  the  potential 
inferences.  A  model  of  the  premise  “John  drove  from  his  house 
to  the  store"  is  not  a  series  of  plausible  inferences  that  follow 
from  it,  but  rather  a  representation  that  captures  the  structure  of 
John  driving  from  his  house  to  the  store.  Although  an  infinite 
number  of  inferences  follows  from  this  model,  reflexive  reason¬ 
ing  does  not  involve  constructing  all  the  plausible  inferences; 
rather,  it  ent,'>ils  constructing  a  model  that  represents  the 
situation  from  which  relevant  inferences  could  be  drawn  as 
necessary. 

In  summary,  Shastri  &  Ajjanagadde  must  overcome  these  two 
problems  (the  lack  of  a  theory  of  plausible  reasoning  in  the 
inference  mechanism  and  the  inability  to  represent  information 
implicitly)  to  make  their  theory  a  more  credible  account  of 
human  reflexive  reasoning. 


Could  static  binding  suffice? 

Paul  R.  Cooper 

Institute  for  the  Learning  Sciences,  Nathwestem  University,  Evanston,  <L 
60201 

Elactronic  mall:  coopen^ils.nwu.edu 

Dynamic  variable  binding  is  widely  accepted  as  a  serious  chal¬ 
lenge  for  connectionists.  Shastri  &  Ajjanagadde  (S&A)  have 
more  than  met  that  challenge  here:  This  is  an  elegant  proposal 
with  appealing  performance  characteristics  (e.g. ,  independence 
of  the  size  of  the  knowledge  base)  and  equally  appealing  compat¬ 
ibilities  with  results  from  psychology  and  neuroscience.  But  the 
endeavor  of  addressing  the  challenge  directly,  as  well  as  the 
character  of  the  resulting  solution,  provokes  the  desire  to  recon¬ 
sider  alternatives  to  a  frontal  attack  on  variable  binding  and  rule- 
oriented  reasoning. 

If  S&A’s  solution  works  and  is  even  elegant,  why  bother  to 
worry  about  whether  there  is  more  to  consider?  First,  possibly 
the  most  interesting  question  for  the  connectionist  enterprise  is 
this:  How  much  can  be  done  in  parallel?  The  traditional  connec¬ 
tionist  approach  when  cross-talk  appears  inevitable  is  to  share 
net  resources  in  time.  Although  S&A’s  solution  can  hardly  be 
construed  as  “sequential,”  it  does  exploit  some  time  sharing. 
Improvements  may  be  possible. 

Second,  S&A’s  contribution  is  motivated  in  ‘arge  part  by  a 
desire  to  provide  a  connectionist  explanation  for  traditional  rule- 
oriented  reasoning.  There  are  two  dangers  here.  What  if  rule- 
oriented  reasoning  turns  out  to  be  unimportant?  It  is  at  least 
conceivable  that  an  alternative  paradigm  such  as  case-based 
reasoning  (Riesbeck  &  Scliank  1989)  may  be  more  useful,  with 
dynamic  variable  binding  possibly  irrelevant.  .■Vnother  possi¬ 
bility  is  that  although  some  rule-oriented  reasoning  may  be 
necessary,  a  full-fledged  treatment  of  n-ar\’  predicates  is  unnec¬ 
essary  and  counterproductive,  therefore,  much  more  restricfc-d 
mechanisms  may  be  sufficient. 

Constraints  and  feasibility,  .So-called  static  binding  may  suf¬ 
fice  to  explain  most  reflexise  reasoning,  despite  appearances  to 


the  contrary  (sect.  2.1.1).  The  essence  of  the  binding  problem  is 
associating  an  argument  and  filler,  or  variable  and  value  in  the 
general  case.  The  static-binding  solution  requires  the  existence 
of  a  unit  for  every  feasible  pairing  of  variable  and  value.  Such 
“binder”  units  and  their  connections  are  stable  long-term  fea¬ 
tures  of  the  network;  thus,  “static.”  However,  it  is  their  activa¬ 
tion  that  indicates  the  presence  of  an  actual  binding.  Thus,  the 
static-binding  solution  is  in  general  capable  of  dynamically 
associating  variables  and  xalues  during  the  course  of  a  com¬ 
putation. 

Knee-jerk  objections  to  this  idea,  motivated  by  the  counterin¬ 
tuitive  nature  of  unit/value  connectionism  (Feldman  &  Ballard 
1982)  and  the  seeming  inability  of  a  static  structure  to  support 
universal,  general-purpose  problem  solving,  can  be  discounted. 
Even  the  most  simplistic  scheme,  allowing  each  of  the  n  entities 
in  memory  to  associate  with  any  of  the  others,  requires  only 
O(n^)  nodes.  But  is  even  n^  too  large?  It  is  on  this  point  that  even 
careful  connectionists  seem  to  run  aground;  indeed,  this  is 
apparently  one  reason  why  Shastri,  who  in  earlier  work  ex¬ 
ploited  statie  binding  (1988a),  developed  the  current  work.  The 
value  of  n  can  certainly  be  expected  to  be  10®  or  better,  which 
makes  too  large  compared  to  the  available  resources  in  the 
brain. 

This  analysis  is  too  simplistic,  however.  Static  binding  re¬ 
quires  only  that  we  use  a  unit  for  every  feasible  pairing  of 
variable  and  value,  or  argument  and  filler.  If  feasible  values  for 
variables  are  restricted  by  exploiting  any  kind  of  type  or  cate¬ 
gory  knowledge  (see  also  sects.  2.4  &  5.4)  and  if  binder  units  are 
only  allocated  for  feasible  values,  the  number  of  required  units 
reduces  dramatically.  Cooper  and  Swain  (1992)  work  this  idea 
out  in  some  detail,  for  example,  for  a  massively  parallel  imple¬ 
mentation  of  arc  consistency.  If  we  limit  the  number  of  values 
per  variable  to  a  reasonably  large  but  fixed  maximum,  the  total 
number  of  binder  units  required  is  linear  in  the  number  of 
variables  in  memory.  In  other  words,  it  would  not  be  unreason¬ 
able  to  assume  that  the  node  requirement  of  static  binding  is  at 
least  close  to  linear  in  the  size  of  the  knowledge  base,  and 
certainly  much  less  than  n^.  Hence  static  binding  can  support 
the  basic  task  of  associating  simple  variables  and  values. 

The  feasibility  of  static  binding  becomes  less  obvious  given 
the  necessity  to  reason  about  combinations  or  compositions  of 
simple  primitives.  On  one  hand,  it  is  obvious  that  any  value- 
liased  encoding  just  cannot  represent  all  the  potential  combina¬ 
tions,  of  arbitrary  order,  that  may  occur.  On  the  other  hand, 
solutions  exploiting  only  low-order  combinations,  particularly 
pairs  (e.g.,  Feldman  1985),  may  suffice  to  explain  the  simpler 
tasks  solved  in  "reflexive”  as  opposed  to  “reflective”  or  overtly 
sequential  reasoning  (which  would  be  an  interesting  result  in 
itself).  Complex  tasks  apparently  requiring  "systematicity”  and 
“composition, ’’such  as  the  recognition  of  structurally  composed 
objects,  are  also  achievable  in  this  way  (e.g..  Cooper  1992). 
Finally,  the  propagation  of  bindings  hardly  poses  an  insur¬ 
mountable  problem.  It  is  exactly  the  propagation  of  constraints 
that  forms  the  basis  of  the  relaxation  process  used  by  most 
connectionist  networks. 

Possibly  the  subtlest  issue  concerns  representing  truly  novel 
associations.  Clearly,  associations  exist  that  a  restricted  static¬ 
binding  network  cannot  represent.  Representing  such  associa¬ 
tions  requires  structural  changes  to  the  network  learning.  But 
hard  learning  of  entirely  new  concepts  is  hardly  “reflexive” 
reasoning.  It  requires  time,  repetition,  attention,  reflection, 
and  so  on.  Thus,  it  seems  reasonable  to  assiune  that  such  hard 
structural  learning  may  retjuire  special-puriiosc  neural 
machinery. 

Ti)  summarize,  it  is  tempting  to  suppose  that  Shastri  & 
Ajjanagadde  have  developed  the  "last  word’’  on  xariable  bind¬ 
ing,  anti  that  this  irksome  challenge  can  at  last  be  put  to  rest.  But 
the  possibility  of  simpler,  faster,  and  more  parallel  metliods  has 
not  yet  been  ridetl  out.  The  overall  point  here  is  hartlK’  irrele¬ 
vant.  That  is,  if  we  ignore  the  crit  s  of  variable-users  to  ertplain 
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how  to  replicate  their  results  and  we  try  hard  to  construct 
systems  that  solve  hard  AI  problems  while  avoiding  the  obvious 
exploitation  of  arbitrary  high-order  combinations  and  frequent 
binding  of  novel  values  to  variables,  we  may  end  up  with  some 
very  interesting  results  indeed. 

From  symbols  to  neurons:  Are  we  there  yet? 

Garrison  W.  Cottrell 
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Shastri  6c  Ajjanagadde  s  (S&A’s)  target  article  proposes  two  main 
ideas,  the  direct  embodiment  of  a  subset  of  formal  logic  as 
structured  associations  between  predicates,  and  the  solution  to 
the  binding  problem  through  phase  labeling  of  entities.  This  is  a 
truly  novel  solution  to  the  binding  problem;  The  phase-labeling 
approach  avoids  the  trap  of  combinatorial  space  requirements 
implied  by  systems  that  “connect”  the  two  entities  through  a 
path  in  a  network.  When  I  first  saw  this  work,  for  a  fleeting 
moment,  I  wanted  to  become  a  localist  again. 

Having  caught  myself  at  the  brink,  my  critique  will  focus  on 
the  logical  inference  side  of  the  system.  The  notion  of  embodied 
inference  is  not  novel  (Cottrell  1985;  1989;  Hebb  1949;  James 
1890;  Lange  &  Dyer  1989;  Shastri  1988b;  Touretzky  &  Hinton 
1988),  but  the  current  system  claims  to  achieve  better  efficiency 
and  coverage.  1  will  consider  three  aspects;  technical  adequacy, 
neurological  plausibility,  and  finally,  the  pattern-containment 
alternative. 

Technical  adequacy.  It  is  unclear  from  the  target  article 
whether  the  quite  complicated  specific  wirings  used  by  this 
model  correctly  implement  the  inferences  S&A  say  they  do.  For 
a  formal  system,  one  usually  wants  to  prove  soundness  (that  only 
inferences  that  are  entailed  by  the  facts  can  be  derived).' 
Correspondence  with  the  first  author  has  allayed  several  of  my 
fears,  but  the  burden  is  on  the  authors  to  prove  that  this  complex 
system  docs  not  make  incorrect  inferences.  The  matter  cannot 
be  decided  by  inspection. 

A  second  worry  is  the  lack  of  expressiveness.  It  appears  that 
negation  cannot  be  represented,  as  it  has  not  even  been  men¬ 
tioned  in  the  text.  To  address  this,  for  every  predicate  P,  one  can 
simply  add  another  node  (or,  in  this  case,  a  set  of  nodes)  to 
represent  the  predicate  ~P.  Inferences  involving  — P  are  then 
driven  by  activation  from  this  node.  A  consistency  gadget 
between  P  and  ~  P  can  be  constructed  to  enforce  that  they  do  not 
both  fire  when  the  network  has  settled.  This  is  the  solution  used 
in  the  Spock  system  (Cottrell  1985;  1989),  which  implements 
Reiter’s  Default  Logic  for  inheritance  (Etherington  &  Reiter 
1983).  In  S&A’s  system,  the  consistency  gadget  will  also  have  to 
enforce  that  the  bindings  of  the  two  representations  of  the 
predicate  are  the  same  when  enforcing  consistency  (SprK'k  is 
proisositional). 

S&As  claim  that  they  can  add  defaults  (sect.  5.5)  and  combine 
the  forward  and  backward  systems  (sect.  3.5)  merits  further 
inspection.  Care  must  be  taken  that  added  exprcssivcne.ss  not 
detract  from  the  efficiency  of  the  system.  In  the  Spook  system,  I 
found  a  classic  example  of  the  exprcssiveness/tractability  trade¬ 
off  (Le\  e.s(]iie  &  Brachmaii  1985).  When  the  ability  to  compute 
the  contrapositive  is  inchuled  in  the  representation  (addingP— ♦ 
D  implies  —t,)  — »  ~P  is  encoded  also)  there  exist  sets  of  facts 
where  luixtnres  ol  delanlts  and  lirst-order  rules  causr'  long 
s<'ttling  tilings  for  the  network  even  when  then*  is  only  one 
consistent  extension.  This  was  line  to  the  baeku’ard  system 
inleriing  -  Ps  (bi  gnn  by  ;i  short  defanit  infereiu-e)  while  the 
lonvaid  s)  stein  w;is  inleriing  Ps  tliroiigh  the  same  tong  chain 
Iroiii  the  other  eiiil.  Main  ili'ialioiis  were  necessary  to  ri’Solve 
tlie  eonffiet.  lioweier.  in  a  \ersion  where  .letiivition  was allowx'd 


to  spread  more  permissively  (dubbed  Dr.  Spock),  settling  times 
were  more  reasonable  due  to  leakage  of  consistency  information 
through  the  network.  The  use  of  graded  evidence  in  the  current 
system  to  pick  the  best  solution  could  alleviate  this  potential 
problem. 

Finally,  one  wonders  whether  the  benefits  of  being  able  to 
express  propositions  about  types  (sect.  5)  is  worth  the  cost  of 
adding  another  system  (the  /S-A  hierarchy)  when  inheritance 
can  be  expressed  as  repeated  appheations  of  logical  inference 
(Hayes  1977). 

Neuroaclentific  plausibility.  From  the  point  of  view  of  model¬ 
ing  the  brain,  this  architecture  requires  a  suspension  of  disbelief 
about  how  such  a  system  could  have  been  produced  by  evolu¬ 
tion.  This  criticism  is  weak  because  it  is  founded  on  an  argument 
from  lack  of  imagination.  However,  in  order  for  this  system  to 
work  properly,  highly  specific  connections  must  be  formed  from 
the  nodes  representing  the  concepts  to  the  connections  be¬ 
tween  antecedents  and  consequents.  Things  get  more  compli¬ 
cated  when  one  wants  to  introduce  constants  as  in  Figure  14,  or 
multiple  arguments  as  in  Figure  15.  Learning  in  such  structured 
networks  uses  a  recruitment  rule  (Valiant  1988),  where  preexist¬ 
ing  connections  between  the  appropriate  units  gain  force.  In 
S&A’s  case,  there  must  be  preexisting  connections  from  every 
potential  concept  to  connections  from  every  potential  predicate 
to  fact  nodes,  which  leads  to  another  combinatorial  explosion. 

Thepattem-contalnhtc  Inference  alternative.  The  possibility  of 
a  system  of  pattern-containing  inference  (sect.  9.4)  is  a  useful 
one  to  pursue.  This  is  the  idea  that  a  system  of  embodied 
inferences  like  the  one  proposed  in  the  target  article  could  be 
constructed  that  passes  distributed  patterns  of  activation  from 
antecedent  to  consequent  in  slots  for  the  arguments.  It  is  clear 
that  pattern-containing  inference  could  also  use  the  phase¬ 
labeling  mechanism  to  maintain  multiple  separate  bindings  for 
the  same  pattern. 

There  are  interesting  advantages  to  using  pattern-containing 
embodied  inference  rules;  (1)  It  should  be  possible  to  learn  rules 
using  back  propagation  or  some  similar  technique  between 
antecedents  and  consequents;  (2)  semantic  filters  would  be 
embodied  in  the  associations;  The  copy  of  predicate  arguments 
from  antecedent  to  consequent  is  essentially  through  an  autoen¬ 
coder  network  (e.g.,  as  in  Hanson  &  Kegl  1987),  thus  only 
patterns  similar  to  the  ones  that  have  appeared  in  exemplars 
would  be  allowed  through.  Semantic  restrictions  on  arguments 
would  therefore  be  handled  completely  locally,  based  on  experi¬ 
ence  with  this  inference,  rather  than  on  constraints  that  must  Ik“ 
enforced  by  connections  from  the  /S-A  system,  as  S&A  propose. 
This  is  inherently  more  efficient.  Also,  covariance  constraints 
between  arguments  would  naturally  be  enforced. 

Whether  or  not  one  rejects  Shastri  &  Ajjanagadde’s  system  for 
the  above  reasons,  it  is  an  undeniable  achievement  of  this  work 
that  it  has  brought  to  light  a  bold  new  idea  for  solving  the 
binding  prolilem  with  processes  available  in  the  brain.  The 
notion  of  phase  labelingof  entities  is  a  [xiwcrful  one,  and  here  for 
the  first  time  wc  have  a  demonstration  of  its  viable  use. 

NOTE 

1.  (lomplcleiifss  (llial  all  soiiiul  inferences  riin  he  ilerisecl)  is  not  at 
issue  here. 
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Ajjanagadde  (S&A)  must  clearly  be  expecting  a  lot  of  company  at 
their  house  in  the  woods,  for  they  not  only  believe  that  they  have 
built  a  better  mousetrap  -  an  effective  and  systematic  reasoner  - 
they  also  believe  that  this  mousetrap  is  of  a  different  kind  -  a 
rule  instantiating,  biologically  plausible  connectionist  device.  A 
closer  scrutiny  of  their  model  reveals,  however,  that  not  only  is  it 
the  same  type  of  mousetrap  that  classical  AI  has  been  springing 
for  decades,  but  that  it  is  also  not  quite  as  good. 

Three  different  weaknesses  suggest  that  the  proposed  model 
is  classical  in  nature.  First,  the  target  article  argues  that  the 
reflexive  reasoner  does  not  require  a  central  controller,  in 
contrast  to  classical  systems.  This  cannot  lie  true.  The  system 
can  only  be  understood  as  providing  a  yes/no  answer  to  ques¬ 
tions  like,  “Can  Mary  sell  Bookl?"  by  having  an  external  control¬ 
ler  that  understands  (and  remembers)  the  original  question,  and 
also  knows  that  a  response  will  be  encoded  as  activity  in  c.can- 
sell.  The  system  cannot  autonomously  make  sense  of  the  bloom¬ 
ing,  buzzing  confusion  of  its  own  activity.  For  instance.  Figure 
13  illustrates  that  at  one  point  in  time  the  nodes  c.can-sell, 
c.'own,  and  e.give  (representing  different  possible  answers),  and 
the  nodes  e.can-sell,  e.own,  and  c.buy  (representing  different 
possible  questions)  are  all  simultaneously  active.  As  a  result,  the 
network  cannot  “know"  what  answer  it  is  giving,  nor  the  original 
question  that  was  posed,  without  external  interpretation. 

Second,  the  target  article  argues  that  the  reflexive  reasoner  is 
not  a  classical  system  because  it  is  rule  instantiating.  This 
amounts  to  a  standard  connectionist  claim  that  network  archi¬ 
tectures  do  not  clearly  demarcate  processes  from  data  structures 
(Dawson  &  Schopflocher  1992).  This  claim  is  clearly  not  true  of 
the  proposed  model;  The  rules  governing  system  inferences  are 
qualitatively  different  and  are  represented  separately  from  the 
data  structures  being  processed,  as  S&A  conveniently  illustrate 
with  the  “squiggly  line"  in  Figure  19. 

Third,  it  is  claimed  that  the  reflexive  reasoner  -  unlike 
classical  systems  -  is  biologically  plausible.  This  too  is  far  from 
established.  Although  it  is  quite  interesting  that  temporal  syn¬ 
chrony  has  been  observed  in  the  cortex,  many  more  specific 
claims  are  not  defended  in  the  target  article.  Several  different 
and  highly  specific  neural  circuits  are  proposed  (e.g..  Figures 
14,  21,  23,  25).  In  addition,  a  number  of  qualitatively  different 
processing  units  -  including  three  different  kinds  of  tau-or  units 
-  are  required.  A  great  deal  of  further  e\  idence  from  neuro¬ 
science  is  needed  to  supixirt  such  claims. 

Biological  plausibility  is  further  weakened  in  the  context  of 
speculations  about  how  the  network  might  Icam  facts  or  rules.  If 
the  learning  of  facts  requires  the  presence  of  an  external  “learn” 
signal,  then  this  strongly  indicates  that  the  network  is  not 
autonomous  and  thus  is  far  from  being  ncurally  plausible.  In 
addition,  although  it  may  be  true  that  the  learning  of  rules  in  the 
reflexive  reasoner  is  no  more  difficult  than  learning  in  nontem¬ 
poral  connectionist  systems,  it  is  certainly  biologically  implausi¬ 
ble  -  particularly  if  backpropagation  is  used  (see  Grosslierg 
1987). 

The  three  arguments  above  suggest  that  the  proposed  model 
is  not  a  mousetrap  of  a  different  kind.  But  is  it  a  better  inouse- 
t.'ap?  Although  S&A  have  shown  that  a  number  of  the  functions 
found  in  traditional  reasoning  systems  can  be  implemented  by 
their  novel  parallel  architecture,  their  network  suffers  from 
some  severe  logical  limitations.  Regrettably,  these  serve  both  to 
compromise  its  inferential  power  aiul  to  cast  further  doubt  upon 
its  putative  biological  plausibility. 

The  proposed  motlel  has  two  significant  difficulties  in  dealing 
with  varialiles  that  (Kciir  in  multiple  argument  positions.  First, 
in  backward  reasoning  it  cannot  use  rides  in  which  a  variable 
(Kcurs  in  multiple  argument  positions  in  a  rules  anleiedent 
when  the  \  ariahle  does  not  also  appear  in  the  rule  s  eonsi-ipient. 
Si'cond.  in  loi’ward  (or  predictive)  reasoning,  rules  in  which 
variables  oi-eur  in  mulli))Ie  argument  positions  in  the  eonse- 
((iienl  eannol  he  used  unless  those  variables  ari'  also  present  in 
the  aiiteeedeiit  ol  the  rule  :uid  .ire  lioiiiul  in  thi'  reasoning 


process.  Although  these  two  limitations  are  acknowledged  in  the 
target  article  (sect.  8.2.5),  S&A  £ul  to  note  the  full  extent  of  the 
problems  they  produce  (e.g.,  with  respect  to  reflexivity). 

These  are  not  the  only  logical  difficulties  from  which  the 
system  suffers.  For  example,  the  proposed  IS-A  hierarchy 
cannot  handle  facts  or  queries  in  which  existential  quantifiers 
iall  within  the  scope  of  universal  quantifiers.  More  significant, 
the  network  has  considerable  difficulties  in  handling  multiple 
instantiations  of  the  same  predicate  (it  also  has  some  lesser 
difficulties  with  multiple  instantiations  of  the  same  concept). 

To  provide  a  convincing  solution  to  the  latter  problems,  a 
considerable  number  of  additional  nodes  would  be  required. 
Unfortunately,  this  would  slow  the  system's  performance  signifi¬ 
cantly.  As  a  result,  S&A  compromise  and  limit  the  number  of 
multiple  instantiations  of  predicates  to  just  three.  However, 
they  do  not  offer  any  evidence  that  this  limit  is  either  biolog¬ 
ically  or  psychologically  plausible.  As  this  limit  is  critical  for 
calculating  the  latency  of  network  operations,  their  claims  about 
the  biological  plausibility  of  the  system’s  speed  must  be  treated 
with  a  degree  of  suspicion. 

This  difficulty,  considered  in  conjunction  with  the  other 
logical  limitations  of  their  network,  gives  grounds  for  believing 
that  the  proposed  model  is  in  fact  less  powerful  than  traditional 
systems.  For  example,  none  of  the  logical  limitations  mentioned 
above  affect  the  classical  backward-reasoning  system  described 
by  Pelletier  (1982).  In  short,  Shastri  &  Ajjanagadde  have  not 
built  a  better  mousetrap  -  perhaps  they  should  get  a  cat! 
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Shastri  &  Ajjanagadde’s  (S&A’s)  approach  is  remarkable  in  many 
ways.  They  offer  efficient  reasoning  in  a  connectionist  knowl¬ 
edge  representation  system.  This  representation  system  has  an 
expressiveness  that  facilitates  the  realization  of  a  number  of 
knowledge  structures  (frames,  scripts,  etc.).  Furthermore,  they 
present  a  model  that  makes  a  numlrer  of  predictions  about 
psychological  processes  and  therefore  allows  experimental  veri¬ 
fication  (or  falsification).  But  most  important,  S&A  attempt  to 
close  the  gap  between  high-level  reasoning  and  neural  process¬ 
ing  and  show  how  “reflexive”  inferences  can  be  drawn  efficiently 
with  slow  neural  elements. 

It  is  natural  that  a  model  that  covers  a  wide  range  of  phenom¬ 
ena  cannot  be  equally  specific  and  appropriate  in  every  detail.  A 
few  points  that  deserv'e  attention  should  be  mentioned. 

Learning.  S&A  do  not  provide  an  answer  to  the  question  of 
learning.  Although  this  seems  to  be  a  research  strategy  that  is 
generally  accepted  in  the  Al  coininunity,  the  lack  of  learning  is  a 
problem  for  S&A  (sections  10.5  &  10.6  speak  alxnit  learning  in 
very  vague  and  general  terms).  In  their  model,  reasoning  is 
based  on  complex,  domain-dependent  networks  and  it  is  an 
important  question  how  tbc.se  network  structures  are  gener¬ 
ated.  In  particular,  if  .S&.\  extend  their  claim  of  neural  plau¬ 
sibility  to  learning,  they  should  an.swer  the  <)uestion  oi  how 
ixnuplex  network  structures  can  Ive  generated  based  on  biolog¬ 
ically  plausible,  namely  sparsely  connected,  neural  networks 
(Dievlerich  RW2;  Stevens  1989).  Furthermore,  although  in  their 
iikkIcI  limited  reasoning  can  lx-  done  efficiently,  the  generation 
of  the  iK'tworks  th;it  allow  thes<'  re;isoning  processes  can  be 
eoinples  or  even  b;ird 
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Kecent  neurobiological  findings  have  shown  that  there  arc 
considerable  changes  in  the  receptive  field  organization  of 
cortical  cells  in  adult  cats  and  primates  (cf.  Merzenich  ct  al. 
1988).  These  changes  are  triggered  by  the  absence  of  input 
(Gilbert  &  Wiesel  1992;  Merzenich  et  al.  1988)  or  experience 
(e.g.,  tactile  stimulation,  Merzenich  et  al.  1988).  These  pro¬ 
cesses  are  relatively  fast.  The  modification  of  the  rec-eptivc  field 
organization  of  cells  can  be  noticed  within  a  few  minutes. 
Gilbert  and  Wiesel  (1992,  p.  152)  assume  that  dynamic  changes 
in  receptive  field  structure  may  occur  continuously  during 
normal  vision. 

On  the  connectionist  level,  these  processes  are  modeled  by 
"recruitment  learning”  methods.  Adding  learning  techniques 
such  as  recruitment  learning  to  S&A's  model  can  help  to  replace 
some  artificial  components  of  the  system,  for  example,  the 
switching  devic-es  and  the  allocation  of  free  memory  banks 
for  multiple  instances,  with  biologically  plausible,  structure¬ 
changing  learning  methods. 

Grownd/ng.  Over  the  past  several  years  several  authors  (e.g., 
Pfeifer  &  Verschure  1^2)  have  pointed  out  that  we  cannot 
understand  a  cognitive  system  without  a  connection  to  sensory 
experience.  That  is,  mental  states  (instantiated  predicates  in 
S&A’s  system)  develop  out  of  real  interactions  with  the  physical 
world.  The  important  point  is  that  it  is  not  sufficient  to  link  a 
high-level  reasoning  system  with  a  sensory  system  in  order  to 
realize  such  a  connection,  but  that  the  conceptual  representa¬ 
tion  itself  is  the  result  of  interactions  with  the  environment  and 
includes  sensory  pathways  that  are  at  least  partially  modified  by 
experience.  A  connectionist  reasoning  system  without  any 
learning  does  not  allow  concept  formation  in  this  sense  and  is 
therefore  restricted  as  a  psychological  model. 

Neural  plausiblllfy.  Although  some  aspects  of  the  model  are 
biologically  plausible  (synchronous  activity,  fast  response  times, 
etc.),  some  components  are  purely  functional  elements.  The 
"enabler”  and  “collector”  units  as  well  as  the  "t  and”  and  “r-or” 
units  are  used  to  allow  reasoning  and  are  not  immediately 
plausible  neural  elements.  The  same  holds  for  the  connectivity 
pattern  for  long-term  knowledge  base  (LTKB)  facts.  As  a  matter 
of  fact,  the  unit  types  and  network  structures  are  excluded  from 
the  discussion  of  biological  plausibility  (section  7  speaks  about 
“synchronous,  rhythmic  activity”  only).  It  is  possible  to  find 
neurobiological  evidence  for  “relay  units,”  and  so  on  (cf.  Singer 
1987),  and  therefore  the  claim  of  biological  plausibility  can  be 
extended.  Network  elements  such  as  unit  types  and  connection 
patterns  must  be  part  of  the  discussion  of  neurobiological 
plausibility. 

S&A  refer  to  neurobiological  findings  that  the  temporal  syn¬ 
chronous  activity  of  cells  in  the  cat’s  visual  cortex  supports  the 
dynamic  binding  of  visual  features  of  objects.  In  other  words, 
the  temporal  synchrony  of  neural  firings  supports  pattern  recog¬ 
nition  and  vision.  What  about  explicit,  reflexive  reasoning?  It  is 
not  at  all  obvious  that  the  neurophysiological  evidence  for  the 
dynamic  binding  of  visual  features  carries  over  to  multistcp 
reasoning. 

Conclusion.  Biological  plausibility,  efficient  reasoning,  and 
the  ability  to  make  a  number  of  predictions  that  allow  psycho¬ 
logical  testing  are  the  outstanding  features  of  Shastri  &  Aj- 
janagadde’s  system.  The  absence  of  learning  is  disappointing 
and  it  is  an  important  question  how  complex  network  structures 
can  be  generated  efficiently  with  biologically  plausible,  connec¬ 
tionist  methods.  Apparently,  there  is  work  on  the  way  on 
learning  and  an  application  of  the  system  to  natural  language 
processing.  This  future  uork  will  show  if  some  of  tiu'  remaining 
problems  can  be  solved. 
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Shastri  &  Ajjanagadde  (S&A)  have  done  a  remarkable  job  in 
modeling  certain  aspects  of  reflexive  reasoning.  However,  some 
evaluation  of  their  representations,  as  well  as  their  solution  to 
the  binding  problem,  seems  appropriate  to  put  S&A’s  model  in 
t>erspcctive  with  respect  to  its  Ireing  “connectionist." 

The  motivation  for  S&A’s  introducing  the  idea  of  synchronous 
firings  of  nodes  was  the  so-called  binding  problem.  Even  though 
this  problem  was  pointed  out  early  (see  their  references  in  sect. 
2. 1. 1),  it  has  really  been  brought  into  the  foreground  in  the  light 
of  critiques  of  connectionist  representations  in  neural  networks 
(Fodor  &  Pylyshyn  1988b).  According  to  these  critiques,  con¬ 
nectionist  patterns  of  activation  are  merely  sets  of  numbers 
lacking  any  structure.  They  therefore  cannot  naturally  repre¬ 
sent  complex  conceptual  relations  where  different  concepts 
have  to  be  bound  to  their  roles.  In  classical  Al,  due  to  the 
prevalent  use  of  syntactic  symbol  structures,  this  was  a  non¬ 
issue.  There,  binding  can  easily  be  defined  by  assigning  roles  to 
syntactic  position.  S&A’s  response  to  the  binding  problem  in 
connectionist  networks  is  as  remarkable  as  it  is  powerful,  but  it  is 
also  a  very  “classical"  one.  Defining  binding  through  syn¬ 
chronous  firings  of  nodes  is  identical  to  the  syntactic  solution  in 
traditional  Al  systems.  The  remarkable  thing  about  it  is  that  the 
representation  is  moved  into  the  temporal  dimension.  An  exam¬ 
ple;  Having  p-seller  and  Mary,  as  well  as  cs-obj  and  Bookl  fire 
synchronously,  respectively  (example  from  sect.  3.1),  is  the 
same  as  representing  this  relation  by  using  pairs  of  symbols  in 
parentheses,  meaning  that  the  concepts  in  each  pair  are  bound, 
for  example  ((p-seller  Mary)(cs-obj  Bookl)). 

The  difference  is  that  here  only  the  spatial  dimension  is  used 
for  concatenating  symbols,  something  that  is  obviously  not 
possible  in  common  neural  networks  (as  units  in  a  network 
cannot  be  repositioned).  Now  ((p-seller  Mary)(cs-obj  Bookl)) 
docs  not  seem  to  be  the  usual  way  of  defining  relations  in  the 
symbolic  approach,  as  often  a  simpler  form  is  used,  such  as  can- 
sell(Mary,  Bookl).  The  binding  in  the  latter  representation  is 
defined  by  implicitly  assigning  the  two  roles  p-seller  and  cs-obj 
to  the  first  and  second  position  in  the  predicate,  respectively. 
This  may  be  possible  in  S&A’s  approach  as  well  (such  as  defining 
that  the  first  concept  firing  is  the  p-seller,  and  the  second  one 
the  cs-obj),  but  it  would  make  many  things  (such  as  mapping 
corresponding  roles  in  different  predicates)  more  difficult.  In 
conclusion,  we  can  say  that  S&A’s  solution  to  the  binding 
problem  is  the  syntactic  solution  of  expressing  concept  rela¬ 
tions,  exploiting  the  temporal  dimension  in  a  clever  and,  as  it 
turns  out,  even  in  a  neurally  plausible  way. 

Recent  literature,  however,  has  suggested  that  the  syntactic- 
solution  need  not  be  the  only  one  (e.g.,  van  Gelder  1990);  it  just 
happened  to  lie  the  obvious  one  for  symbolic  models.  Connec¬ 
tionist  models  -  if  taken  in  a  much  more  general  sense  than  in 
S&A’s  work  -  have  the  power  to  represent  conceptual  structures 
in  a  nonsyntactic,  that  is,  supcrpositional  way  (Chalmers  1990; 
Pollack  1990;  Sharkey  1992).  The  problem  with  the  literature  on 
this  topic  is  that  most  such  approaches  arc  tested  on  problems 
equivalent  to  what  the  syntactic  solution  can  achieve  (e.g.,  the 
transformation  of  parse  trees,  as  in  Chalmers  1990).  Thus  I  will 
brieflx’  report  results  from  our  own  work,  which  rlemonstrate 
that  the  supcrpositional  wa\'  of  representing  compositional 
structure  can  lead  to  a  much  bigger  step  toward  nenr;ill\-  aiul 
psychologically  plausible  nuKlels  of  reflexive  reasoning. 

The  key  to  such  a  further  step  is  what  we  c;ill  "soft  comiiosi- 
tionality.  ”  Connectionist  literature  on  concept  formation  aiul 
disiribiitevi  rr-presentations  h;is  sbown  th;il  iieur;il  networks  can 
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implement  “soft"  concepts  (Hinton  1986;  Smolensky  1988)  and 
rules  (McMillan  et  al.  1991).  Both  can  be  viewed  as  fuzzy  and 
analog  entities  which  only  in  certain  situations  become  discrete 
(e.g.,  in  the  process  of  unambiguous  recognition  of  objects). 
S&A’s  model,  on  the  level  of  single  concepts  and  rules,  could  be 
viewc-d  as  a  post-hoc  approximation  of  such  concept  structures, 
explaining  the  situations  where  their  discreteness  counts.  We 
want  to  argue  that  this  cannot  easily  be  said  about  bindings  of 
concepts  to  their  roles,  as  there  should  lie  a  continuum  between 
composite  structures  consisting  of  several  concepts  and  their 
roles,  on  one  hand,  and  holistic  unstructured  concepts,  on  the 
other. 

Consider  the  following  example  of  objects  and  their  spatial 
relations  to  each  other.  Suppose  1  enter  a  room  that  contains, 
among  other  objects,  a  table  with  a  chair  on  top.  It  is  clear  that  1 
could  represent  this  by  a  structure  like  on-top(chair,  table)  or 
{(chair  upper)(tahle  lower)).  But  when  1  have  to  reason  reflex- 
ively,  it  will  depend  on  the  situation  whether  it  really  matters 
that  there  are  two  objects  in  a  relation  to  each  other.  If  1  want  to 
screw  in  a  light  bulb,  1  might  deal  with  the  whole  thing  as  one 
concept  (such  as  ladder).  If  my  goal  is  not  to  bump  into  anything, 
it  might  be  no  concept  at  all,  only  a  “fuzzy  blob"  causing  some 
motor  reaction.  The  hypothesis  now  is  that  in  reflexive  reason¬ 
ing  (exactly  the  kind  S&A  want  to  model)  there  can  be  a 
continuum  between  complex  structures  and  holistic  concepts  in 
the  representations  used  fur  reasoning.  In  other  words,  some¬ 
thing  might  be  represented  best  as  a  complex  structure,  as  one 
whole,  or  as  anything  “in  between.”  Tlie  important  situations 
arc  the  latter  ones.  1  might  have  screwed  in  the  light  bulb  and 
gone  out  of  the  room  inclined  to  say  that  I  dealt  with  one  object 
in  there.  The  question  “Isn’t  there  anything  1  can  sit  on?" 
however,  can  push  my  vague  comix)sitionaI  representation  of 
the  two  objects  alwve  threshold  to  permit  the  answer  “yes,  I 
guess  I  saw  a  chair  on  top  of  something  else"  (perhaps  adding, 
“or  was  it  underneath?").  In  syntactic  models  such  as  S&A’s, 
complex  predicates  are  always  complex  predicates,  and  roles  are 
always  distinct.  S&A  do  hint  about  a  "soft”  variant  of  rules  (sect. 
5.5),  but  “soft"  compositionality  in  our  sense  would  go  beyond 
that,  in  that  the  existence  of  roles  itself  can  be  fuzzy  and  analog. 

Dorffher  and  Rotter  (1992)  present  a  little  model  that,  in  a  first 
step,  partially  achieves  “soft"  compositionality.  It  is  based  on  the 
so-called  binding  vector  (BV,  Rotter  &  Dorffher  1990),  which 
achieves  superpositional  representations  similar  to  raam  (Pol¬ 
lack  1990),  but  without  prior  learning.  By  starting  with  spatial 
relations  and  sensory  input  it  also  accounts  for  learning  and 
grounding  (another  aspect  where  S&A  have  disappointingly 
little  to  say).  The  most  interesting  aspect  with  respect  to  S&A’s 
work  is  that  binding  in  the  BV  is  also  achieved  by  synchronous 
activation  of  concept  and  role.  The  difference  is  that  after  this, 
activations  arc  sustained  and  superimposed  onto  each  other  and 
thus  do  not  need  to  stay  distinct. 

In  final  conclusion,  I  want  to  argue  that  S&A’s  model  of 
reflexive  reasoning  falls  short  of  what  other  types  of  connection- 
ist  models  could  be  capable  of  achieving.  To  be  honest,  research 
in  self-organizing,  distributed  networks  is  still  in  its  infancy  and 
cannot  directly  compete  with  systems  as  complex  as  the  ones 
S&A  propose.  Most  of  what  has  been  said  here  must  therefore 
remain  a  claim  people  working  on  those  models  will  have  to  live 
up  to.  If  they  will,  we  could  say  that  S&A’s  model,  although 
powerful  in  many  respects,  is  still  much  more  symlmlic  than  it  is 
connectionist. 
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Does  the  SiAmotM  use  bhHUngntPchanlunttlndlar  to  thorn  ot 
the  brein?  It  is  not  clear  from  our  experiences  with  the  visual 
cortex  whether  the  model  of  Shastri  &  Ajjanagaddc  (S&A)  will 
still  function  as  required  if  neural  network  dynamics  are  in¬ 
cluded  that  are  similar  to  those  of  cortical  neurons.  This  is  called 
into  question  when  we  compare  the  rhythmic  activities  of  the 
model  and  that  of  a  real  cortex.  The  most  striking  differences  are 
the  completely  different  dynamics  (Eckhom  et  al.  1988;  1990). 
Although  the  S&A  model  uses  a  simple  phase-delay  scheme  for 
labeling  and  binding  different  entities  and  although  it  is  driven 
by  a  rhythmic  input,  synchronized  oscillations  in  the  cortex  are 
probably  due  to  a  self-organizing  process  among  mutually 
coupled  neurons;  systematic,  stimulus-specific  phase  delays  of 
simultaneously  occurring  oscillatory  events  have  not  been  ob¬ 
served  to  date.  ’The  relatively  stable  phase  delays  that  have 
frequently  been  observed  in  aiwut  10%  of  our  recordings, 
however,  did  not  occur  in  a  stimulus-specific  way,  as  would  be 
required  for  bindings  of  entities  in  the  S&A  model.  We  can 
explain  them  as  phase  differences  between  signals  from  excita¬ 
tory  and  inhibitory  neurons  that  are  locally  coupled  and  in¬ 
volved  in  the  generation  of  oscillations  in  the  respective  local 
assembly.  In  addition,  oscillatory  events  (spindles)  in  the  cortex 
are  highly  variable  in  their  frequencies,  amplitudes,  durations, 
and  delays  in  contrast  to  the  signals  in  the  S&A  model. 

The  impressions  from  our  observations  of  synchronization 
processes  in  the  visual  C'ortex  and  from  our  related  neural 
network  models  suggest  the  following  signal  dynamics  suitable 
for  transient  bindings  of  representations:  The  cortex  may  be  able 
to  represent  a  large  number  of  entities  nearly  simultaneously  by 
forming  synchronized  oscillations  of  short  durations  and  vari¬ 
able  frequencies  in  many  different  assemblies.  Such  oscillatory 
events  arc  statistically  independent  in  their  signal  courses  as 
long  as  the  entities  they  represent  do  not  belong  together. 
However,  oscillatory  active  assemblies  that  are  coupled  by 
(mutual)  connections  can  transiently  form  common  oscillatory 
states  of  zero-phase  difference  by  mutual  synchronization  and 
they  may  define  by  this  process  tlie  transient  binding  between 
different  entities.  Such  types  of  binding  do  not  require  distinct 
phases  for  distinct  entities.  Instead,  entities  may  be  defined  by 
the  internal  coherence  of  the  signals  in  a  subassembly,  and 
binding  between  subassemblies  may  be  defined  by  the  degree 
of  transient  signal  correlation  between  signals  of  different 
subassemblies. 

In  addition  to  rhythmic  synchronization,  nonrhythmic  syiKhro- 
nizatlon  might  support  dynamic  binding.  The  binding  process 
described  above  does  not  rely  on  rhythmic  signals.  Instead, 
nonrhythmic  signals  might  introduce  even  higher  degrees  of 
freedom  and  thereby  allow  binding  between  more  distinct 
entities  (in  the  sense  of  the  S&A  model).  This  means  that 
dynamic  bindings  might  be  achieved  more  generally  by  tran¬ 
sient  correlations  between  signals  of  any  type,  including  oscilla¬ 
tions  as  a  reasonable  ease.  In  particular,  irregular  signals  seem  to 
be  highly  appropriate  for  the  lalK'ling  of  related  entities,  as  has 
been  shown  in  models  of  visual  scene  segmentation  (Pabst  et  al. 
1989).  Participation  of  nono.scillatory  sigiurls  in  dynamic  bind¬ 
ings  is  further  supported  by  our  observation  that  in  the  visiuil 
cortex  oscillation  spindles  can  be  partial!)'  or  coinplelel)'  sup¬ 
pressed  In  transient  stiinnii  llnit  drisecorlical  neurons  of  similar 
types  synchronoiisl)  in  a  stiinulns-hK-ked  manner  (Kruse  el  al. 
19t)2).  I’or  these  eases  we  havx'  propose<l  that  stiinnlns-loeked 
sviu  lironi/ation  serves  lor  the  transient  binding  (Kek  loi  n  et  al 
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1990).  This  view  is  sup|x>rted  by  everyday  experience.  Strong 
transient  visual  stimuli  can  l)e  perceived  rapidly,  even  in  com¬ 
plex  visual  scenes,  much  faster  than  the  cortex  would  require  to 
generate  several  periods  oi  a  40  or  50  Hx  oscillation. 

Naurom  cm  process  and  transmit  sufficlant  rates  o!  Itdorma- 
(ton  for  the  representation  and  routing  of  comples  dynamic 
bindings.  it  is  stated  in  S&A's  introduction  that  neurons  are  slow 
computing  devices  and  that  they  communicate  relatively  simple 
messages  that  can  encode  only  a  few  bits  of  information  (2  bits  in 
15  msec).  This  led  S&A  to  the  conclusion  that  a  neuron  s  output 
cannot  encode  names,  |K)inters,  or  complex  structures  that 
would  be  necessary,  for  example,  to  form  dynamic  representa¬ 
tions  and  to  propagate  (route)  them  into  sjK-cific  directions. 
However,  S&A  may  have  underestimated  the  amount  of  trans¬ 
mitted  information  because  they  used  a  frequency  code.  Being 
aware  of  this,  they  added  the  argument  that  even  if  interspike 
delays  are  included  in  neural  coding,  the  time  available  for  a 
neuron  to  respond  to  its  inputs  is  very  limited,  and  so  is  the 
amount  of  transmitted  information,  In'cause  a  presynaptic  neu¬ 
ron  can  only  communicate  one  or  two  spikes  to  a  [lostsynaptic 
neuron  before  the  latter  must  produc-e  an  output  (see  note  4), 

Tltese  arguments  do  not  convince  us.  First,  a  single  cortical 
neuron  generally  has  thousands  of  synapses  at  which  one  or  two 
spikes  can  appear  within  a  few  milliseconds,  leading  to  a  com¬ 
plex  time  course  of  the  [x>stsynaptic  potential  with  generally 
high  information  density.  Second,  signal  processing  in  cortical 
neurons  is  not  terminated  by  the  generation  of  an  output  spike, 
because  spikes  generally  do  not  "reset"  the  membrane  tJoten- 
tial,  as  can  be  seen  directly  in  intracellular  recordings  from 
cortical  neurons  (e.g.,  Douglas  et  al.  1991).  In  addition,  it  has 
Ix’en  shown  rigorously  that  inforination  rates  in  sensory  and 
motor  systems  of  mammals  reach  values  of  300  bits/sec  on  a 
single  nerve  fiber  if  coties  arc  chosen  that  match  the  signal 
transfer  properties  of  the  respective  neurons  (Eckhom  et  al. 
1976).  This  is  equivalent  to  average  information  rates  of  4.5  bits 
in  15  msec.  However,  if  one  calculates  the  actual  time  courses  of 
information,  rates  of  6  bits  in  15  msec  often  occur  (Eckhorn  & 
PoepcI  1975).  Such  high  rates  in  single  neurons  are  assumed  to 
l)e  sufficient  for  the  signaling  of  complex  messages,  including 
the  routing  of  dynamic  representations. 

Much  higher  rates  of  information  would  be  available  at  the 
“nodes”  of  the  S&A  model  if  the  functional  units  of  the  nodes 
were  realized  by  local  groups  of  similar  neurons.  Such  local 
ensembles  might  use  probability  coding  on  parallel  output 
fibers,  namely,  the  probability  of  discharge  of  any  of  these 
neurons  would  be  the  (quasi  analog)  signal  that  transmits  the 
information  of  the  ensemble.  Information  capacities  in  such 
systems  using  group  codes  can  reach  much  higher  values  than 
those  on  single  fibers  because  noise  is  reduced  with  increasing 
size  of  the  group.  If,  for  example,  intrinsic  neural  noise  is 
statistically  independent  (in  the  idealized  case),  then  groups  of 
N  =  100  neurons  can  transmit  information  rates  that  are  higher 
by  a  factor  of  10  than  those  of  a  single  neuron  (proportional  to  the 
srpiarc  root  of  N). 

Although  S&A’s  arguments  for  labeling  and  routing  of  com¬ 
plex  representations  by  simply  synchronizing  the  appropriate 
“nodes”  are  convincing  for  me,  1  would  like  to  stress  the  point 
that  at  least  in  principle,  other  mechanisms  might  be  used, 
because  neurons  ha\e  the  capacity  of  signaling  high  rates  of 
information  during  short  intervals.  In  the  S&A  iiukIcI  the 
propagation  time  for  a  svnehronizeil  state  to  the  next  ikkIc 
re(|uires  about  20  msec,  which  is  the  cycle  time  ob.serv'cd  in 
stimuhis-indueed  oscillations  (Eckhorn  et  al.  19-SS;  (bay  & 
.Sing<T  19S9).  Dnrin.g  20  nisee  S  bits  can  ahead)  be  transmitted 
by  a  single  real  neuron,  which  is  enough  for  signaling  alumt  5(K) 
alternalire  stair’s  (each  available  lor  the  eommimnation  of 
names  or  pointersi. 
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It  is  still  Lxnnmonplace  to  identify  eonnectionist  (or  neural 
network)  models  with  initially  unstructured  systems  that  are 
adapted  through  supervised  or  unsupervised  learning.  Shastri 
&  Ajjanagadde’s  (S&A’s)  target  article  indicates  how  much  richer 
the  paradigm  can  be.  What  I  find  most  remarkable  is  the 
progress  that  has  been  achieved  in  the  first  decade  of  the  new 
wave  of  eonnectionist  research.  The  central  technical  issue  in 
this  paper,  eonnectionist  variable  binding,  was  viewed  as  intrac¬ 
table  only  a  few  years  ago  but  now  has  a  variety  of  competing 
solutions  as  outlined  by  S&A.  And  these  solutions  arc  no  mere 
implementation  of  formal  logic  -  the  representations  preserve 
many  of  the  key  computational  advantages  of  eonnectionist 
models:  parallelism,  context-sensitivity,  robustness,  and  evi¬ 
dential  combination.  A  learning  story  needs  to  lie  added,  but 
there  is  progress  here  also. 

The  most  important  thing  al>out  the  target  article  is  not  the 
extent  to  which  it  is  right  or  wrong,  but  the  fact  that  we  can  now 
evaluate  detailed  models  of  complex  phenomena  that  can  lay 
claim  to  Ivehavioral,  biological,  and  computational  adequacy. 
Not  long  ago  tbis  would  have  been  impossible.  It  is  obviously  not 
easy  to  attack  a  hard  problem  (here,  reflexive  reasoning)  from 
the  various  perspectives  simultaneousl)’,  but  we  now  have  the 
tools  for  expressing  such  integrated  models  and  this  is  beginning 
to  have  a  profound  effect  on  the’  unified  behavioral  and  brain 
sciences. 


Deconstruction  of  neural  data  yields 
biologically  implausible  periodic  oscillations 
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Shastri  &  Ajjanagadde  {Sin  A)  provide  a  fine  example  of  circular 
reasoning  in  their  de.scription  of  the  “biological  plausibility’  of 
their  model,  in  that  the  results  to  which  they  appeal  constitute 
an  Al-based  interpretation  of  neurophysiological  recordings 
rather  than  raw  measurements  of  activity  in  the  visual  csrrtex  ol 
animals. 

The  “feature  detector”  interpretation  deriving  from  the  ex¬ 
perimental  work  of  Mounteastle  (1957),  Lettvin  et  al.  (1959). 
Hubei  and  Wicsel  (1962),  and  many  others  holds  that  when  a 
eonjplex  sensors'  stimulus  arrives  in  the  sensory  cx)rtex,  a  small 
subset  of  neurons  is  vigorously  excited  and  inhibited.  Von  der 
Malsburg  and  Schneider  (19H(S)  were  the  first  to  investigate 
systematically  some  of  the  am'  'iiilic’s  that  arise  wIk’ii  diverse 
stimuli  can  generate  the  saui.  static  neural  response;  they 
proposed  a  mechanism  of  phase-l(K'ked  peri<Klic  oscillatiotis  to 
resolx’c  them.  The  findings  of  Engel  et  al.  (1990)  and  Eckhorn  et 
al.  (1988)  appear  to  bear  out  his  proposed  solution,  so  that  S&A 
feel  justified  in  pointing  to  the  similarity  between  putatix  e  \  isual 
cxirtieal  function  and  their  model  liased  on  periodie  orbits  ami 
phase-Ioeketl  pulses  at  some  common  tre(]uene)  in  their  net 

It  is  in  the  aspect  of  periodieit)  that  their  model  falls  short  ol 
tin-  biological  data:  the  limit,  howexer.  lies  not  with  S&.\  but 
with  the  interpretation  b)  tbe  biologists  11  eortiial  neurons 
were  routiiu’l)  obserxed  to  lire  pel iodie.illx  at  a  design. iti'd 
lu'twork  lr^•(|nene\  tben  the  xcm  der  Malsburg  iuter|)relatiou 
xxoiild  be  ;miplx  justified  I’et  iodie.illx  liriug  ueiirous  .ire  indeed 
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found  on  occasion,  particularly  in  the  cat.  which  for  unknown 
reasons,  has  a  peculiar  tendency  to  narrow  band  oscillations  in 
all  of  its  sensory  cortices;  but  these  neurons  form  a  small  tail  in  a 
distribution  of  firing  rates  and  patterns  and  the  great  majority  of 
neurons  yield  pulse  interval  histograms  that  conform  more  to 
the  Poisson  than  to  the  |x.Tiodic  distribution.  Also  the  time- 
lagged  covariances  l)etween  the  pulse  trains  of  pairs  of  neurons 
tend  to  lie  vanishingly  small  (Al)eles  1991),  which  would  not  be 
so  if  the  neurons  usually  shared  a  firing  frequency,  whether  or 
not  they  were  in  phase. 

A  further  problem  is  that  the  mean  firing  rates  of  most  cortical 
neurons  are  considerably  less  than  the  prevailing  peak  frequen¬ 
cies  of  cortiral  dendritic  potentials  (local  field  potentials  or 
electroencephalograms  [RRCs])  in  the  gamma  band  (including 
the  40  Hz).  This  fact  is  obscured  by  such  techniques  as  multiple 
unit  extracellular  recording,  which  is  a  form  of  spatial  ensemble 
averaging  over  a  local  cortical  domain;  correlation  analysis  of 
spike  trains,  which  is  a  form  of  time  ensemble  averaging  that 
enhances  the  appearance  of  narrow  band  oscillation  by  expres¬ 
sing  the  time-variance  of  a  fretpiency  as  the  decaying  envelope 
of  the  correlation  oscillation  at  the  center  frequency;  and  spike- 
triggered  averaging  of  EEGs,  which  invokes  the  spatial  ensem¬ 
ble  averaging  that  is  inherent  in  dendritic  extracellular  field 
potentials  (Freeman  1975;  1991)  and  the  time  ensemble  aver¬ 
aging  that  enhances  the  center  freciuency  of  a  distribution  of 
frequencies.  Again,  the  cat  (from  which  the  bulk  of  new  results 
in  this  development  have  iK-en  taken)  yields  particularly  simple 
wave  forms,  but  unaveraged  records  from  the  lagomorph  and 
simian  visual  cortices  reveal  broad  spectrum  EEC  activity 
relating  to  goal-directed  la-havior  on  single  trials  (Freeman  & 
van  Dijk  1987),  which  is  oscillatory,  to  be  sure,  but  strongly 
aperiodic. 

In  brief,  pulse  trains  and  EE('.  waves  are  mostly  aperiodic.  It 
is  the  requirement  of  Al-based  modeling  that  leads  to  manipula¬ 
tion  of  the  data  for  the  extraction  of  center  frequencies  and  to  the 
suggestion  that  there  is  rapid  <'onvergence  of  visual  cortical 
dynamics  to  limit  cycle  attractors.  Now  definitions  of  “phase 
locking"  and  “phase  coherence"  (as  distinct  from  spatial  coher¬ 
ence  of  broad  spectrum  activity)  can  only  be  based  on  the 
existence  of  discrete  frequencies.  The  characteristically  sloppy 
wave  forms  seen  in  raw  data  indicate  that  the  cortex  is  rather 
indifferent  to  precise  control  of  the  frerjuencics  of  its  pidsc  trains 
and  dendritic  current  amplitudes,  and  that  it  allows  them  to  vary 
continually  seemingly  at  random.  But  the  phase  of  a  continuous 
fre<juency  distribution  cannot  be  defined  for  these  events. 

Even  with  the  techniques  of  data  refinement  noted  alrove, 
which  suggest  that  narrow  band  oscillations  are  capable  of 
coming  into  synchrony  in  time  periods  as  short  as  one  cycle  (20 
to  40  msec),  there  is  a  rcqjorted  spread  of  coupling  of  ±  27°  to  ± 
54°  at  25  to  50  Hz  and  a  95%  confidence  range  of  108°  to  216°.  If 
these  confidence  intersals  hold  in  the  presented  mo<lcl.  the 
short  time  segments  of  0. 1  sec  for  the  perceptual  frames  that  are 
invoked  by  S&A  will  not  yiekl  adetpiate  reliability  for  readout  by 
detectors  of  phase  l<K-kings  from  a  transmitting  array.  S&A  note 
some  of  the  further  unresolved  difficulties  regarding  the  nian- 
agcincnt  of  multiple-phase  iikkIcs  an<l  tlu’  cuinpo\mding  of  the 
difficidties  when  “soft”  rules  are  brought  into  play,  by  which 
continuous  gradations  of  the  degrees  of  synchroni/ation  are 
used.  Hence  S&A’s  motlel  is  biologically  implausible. 

The  hypothesis  underlying  S&A's  formulation  of  the  “hinding 
problem”  is  that  the  visual  cortex  op<>rates  by  extracting  strong 
cxrrrelations  among  a  small  suhsr’l  of  very  active  iK-urons  in  any 
given  time  s<’ginenl.  An  alternative  hypothesis  is  that  the  cortex 
operates  by  extracting  we;ik  eox  ariances  among  very  large’  popu¬ 
lations  oi  neurons  whaleser  the  magniliides  of  their  individual 
activity  On  this  premise  ;i  nonlinear  dynamics  can  be  con¬ 
structed  (I'  reeman  1991 1  that  eu\  isions  the  existeiue  ol  iiinltiple 
chaotic  st;(tcs  and  'itincr.inl  trajectories  ;uuong  them  i  Isiida 
BWI).  Here  spatial  cohcii’iicc  is  crn<  i;il.  and  ;dthongh  it  ean 
<K easionalb  be  delected  .is  phase  locking  through  imissaging  ol 


the  data,  the  instantaneous  frequency  can  and  does  vary  pseu- 
dorandomly  over  the  short  term.  Simulations  suggest  that  chao¬ 
tic  dynamics  may  be  unusually  powerful  at  solving  pattern 
recognition  tasks  (Yao  et  al.  1991). 

Biological  memory  systems  are  well  known  for  their  vagaries 
(Bartlett  1934),  and  a  case  has  been  made  that  the  essential 
neural  dynamics  underlying  the  construction  (not  reconstruc¬ 
tion)  of  images  during  acts  of  remembering  is  chaotic  (Freeman 
1991;  Skarda&  Freeman  1987).  S&A  note  properly  that  they  are 
not  intending  to  elaborate  a  model  for  brain  function  so  their 
appeal  to  "biological  plausibility”  seems  inappropriate.  Their 
model  for  a  c-ommonsense  knowlc-dge  Irase  may  be  fruitful  in 
managing  retrieval  of  information  from  large  libraries,  but  only  if 
the  “knowledge”  has  already  been  once  removed  from  the  real 
world,  just  as  the  purport  »’d  “limit  cycle"  behaviors  of  nerve  cells 
have  already  l>ecn  decon  i  ucted  from  the  actual  performance  of 
neurons  in  living  brains. 

Must  we  solve  the  binding  problem 
in  neural  hardware? 

James  W.  Garson 
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Shastri  &  Ajjanagadde’s  (S&A's)  idea  of  representing  variable 
binding  with  signal  synchrony  and  implementing  deduction  by 
entrainment  of  signals  is  attractive.  However,  S&A  postulate  a 
very  specialized  and  rigid  neural  architecture  to  accomplish 
these  tasks.  S&A's  model  verifies  that  a  net  can  implement  fast 
(but  limited)  reasoning  b>’  exploiting  signal  synchrony.  That  does 
not  tell  us  much  alxtut  the  brain,  however.  If  would  l)e  a  miracle 
if  the  brain  c'ontained  networks  involving  collectors,  enablers, 
rho-btu  nodes,  tau-nodes,  concept  clusters,  and  switches  ex¬ 
actly  as  S&A  describe.  Our  understanding  of  the  task  to  be 
modeled  and  the  machinery  available  comes  nowhere  near  to 
constraining  the  solution  this  tightly. 

S&A's  project  belongs  to  a  paradigm  in  connectionist  research 
that  attempts  literal  implementation  of  machinery  (such  as 
binding)  drawn  from  classical  AI.  There  is  an  alternative  connec¬ 
tionist  paradigm  that  takes  the  project  of  understanding  binding 
much  less  literally  (Elman  1991;  Servan-Schreiber  et  al.  1989). 
It  postulates  a  very  simple  recurrent  net  architecture  and 
succes.sfully  trains  nets  (with  imxlified  backpropagation)  on  tasks 
that  are  traditionally  thought  to  involve  variable  hinding.  In  this 
empirically  minded  paradigm,  no  attempt  is  made  to  define 
ahead  of  time  the  subtasks  or  their  manner  of  implementation. 
That  is  to  be  discovered  by  the  net,  not  stipulated.  Here 
solutions  to  the  “binding  problem  "  emerge  from  weight  selec¬ 
tion  in  a  general  purjjose  architecture  that  uses  distribuietl 
rather  than  local  rc’prcsetitations.  This  line  of  research  has  not 
tackled  reasoning  directly,  but  it  shows  at  least  that  some 
implicit  binding  can  be  handled  without  six’cial  architecture. 

The  strong  point  of  S&A's  proiM>sal  is  to  display  the  advantages 
of  exploiting  time  in  cx)nnectionist  representation.  However, 
the  same  strategy  is  also  exploited  by  nets  trained  in  the 
empirical  paradigm.  Hen’  sxntactic  structure  is  represented  h\ 
the  system's  triijeclonj  through  phase-  si)acc  (van  Cielder  1991). 
Task  cla.vsici.sls  charaeterize  as  iinolving  hinding  arc  aex'oni- 
pli.shed  by  setting  wi’ights  so  that  these  trajee  tories  are  con¬ 
strained  in  the  right  way.  Empirically  cxnistrncteel  sxsteins  do 
not  repre’seni  argninents  and  fillers  lileralh  with  nodes  or 
groups  ofiMKles.  Bepresentalion  is  distrihnted.  and  it  is  oiiK  l)\ 
principle  (xnnponent  anaKsis  ol  p;ittern  se(|nemes  on  hidden 
units  that  the  outlines  ol  the  elassu  al  argnnient-liller  ideas  i  .ni 
Im’  brought  into  loe  ns.  This  makes  it  hard  to  nnih  rst.ind  process¬ 
ing  with  ilisirihnii’d  repiesi’nt.ilions  in  cl. issic.il  terms  llow- 
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over,  distributed  representations  are  more  diffieult  for  a  net  to 
aeqiiire  and  manipulate  (Chalmers  1990). 

Empirical  model  building  is  no  better  than  classical  at  proving 
how  the  brain  actually  functions.  However,  it  does  sensitize  us  tu 
the  idea  that  understanding  reasoning  in  the  brain  may  require 
transforming  classical  constructs  rather  than  models  for  their 
literal  implementation.  Given  the  possibility  that  distributed 
coding  avoids  the  not'd  for  specialized  architecture,  speculation 
on  details  of  an  architecture  spcciTically  designed  to  support 
reasoning  on  local  (or  scmilocal)  representations  is  premature. 

I  have  a  numlrer  of  more  detailed  worries  about  S&A’s 
proijosal: 

1.  I  wonder  how  the  theory  deals  with  negation.  How  do  we 
handle  negative  conclusions,  and  (say)  rules  of  the  form:  No  A  is 
B?  It  is  true  that  negation  can  be  implicitly  expressed  in 
production  rules  by  replacing  a  negative  consequent  by  the 
corresponding  positive  antecedent.  But  applying  this  strategy  to 
“No  A  is  B"  leaves  the  c-onsequent  empty,  and  there  is  no 
provision  for  empty  conseriuents  in  S&As  scheme.  As  a  useful 
benchmark  for  how  serious  problems  involving  negation  might 
be,  1  invite  S&A  to  explain  how  “No  one  is  taller  than  himself” 
could  lie  a  reflexively  reached  conclusion. 

2.  I  still  wonder  how  rules  can  be  learned.  The  strategy 
(described  in  sect.  10.6)  of  tuning  weights  to  establish  the  right 
connections  between  predicates  only  works  if  generic  links 
between  all  the  right  predicates  arc  already  available.  Providing 
links  for  all  possible  connections  between  all  possible  predicates 
sets  off  a  combinatorial  explosion.  (To  make  matters  worse,  there 
have  to  be  separate  links  for  forward  and  backward  reasoning 
since  neurons  do  not  conduct  in  two  directions. )  So  S&A's  model 
predicts  that  many  rules  simply  cannot  be  learned  because  the 
predicates  happen  not  to  be  “neighbors."  Distributed  represen¬ 
tation  avoids  this  problem  because  “links"  between  arbitrary 
concepts  are  forged  as  processes  rather  than  in  physical  space. 

3.  This  problem  is  compounded  when  we  turn  to  proposi¬ 
tional  attitudes,  prepositions,  and  other  modifiers.  Consider  all 
the  sentences  we  can  construct  by  adding  and  deleting  elements 
to  “Al  saw  Carol  deceptively  sell  Dogl  to  Ed  in  the  presence  of 
Frank  under  the  influence  of  alcohol  in  a  park.”  To  represent 
these  sentences,  modifiers  and  propositional  attitudes  must  be 
dealt  with  as  argumen  ts  of  the  main  verb  (“sold”  in  this  case). 
(We  certainly  cannot  represent  every  possible  combination  as  a 
separate  predicate.)  But  then  each  predicate  must  have  argu¬ 
ments  for  all  conceivable  (and  eventually  leamable)  modifiers 
and  propositional  attitudes.  The  space  investment  in  each  predi¬ 
cate  is  massive.  Furthermore,  I  see  no  practical  way  to  account 
for  reflexive  reasoning  from  “ Al  sees  that  p”  to  “Al  knows  that  p. " 
(Reflexive  reasoning  for  iterated  attitudes  or  sentences  where 
modifier  scope  matters  would  also  be  impossible.)  The  argu¬ 
ment  structure  of  English  is  too  complex  and  open-ended  to  be 
written  into  our  neurons. 

S&A  may  complain  that  I  take  their  model  too  literally.  They 
say  in  section  1.4  that  their  model  is  not  intended  as  a  blueprint 
of  how  the  brain  performs.  However,  if  their  model  is  not  at  least 
a  provisional  neural  wiring  diagram,  then  it  is  not  clear  how  the 
arguments  involving  neural  time  and  space  constraints  they  cite 
are  relevant  in  supporting  their  model  over  its  competitors.  If 
links  indicate  functional  structure,  we  must  wait  until  we  know 
how  links  are  implemented  before  we  apply  considerations 
concerning  (say)  speed  of  neural  conduction. 
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Shastri  &  Ajjanagadde  (S&A)  note  how  carefully  controlled 
synchrony  may  subserve  a  rapidly  evolving  inference,  but  they 
do  not  show  how  the  knowledge  being  synchronized  can  be 
learned,  and  how  this  learning  process  leads  to  the  desired 
syiK  lirony  relationships.  A  recently  discovered  family  of  super¬ 
vised  Icaniing,  categorization,  and  prediction  architectures  pro¬ 
vides  insight  into  aspects  of  this  fundamental  problem.  Tliesc 
neural  architectures  are  gcnerically  called  aktmap  (Carpenter 
&  Grossberg  1991;  1992;  Carpenter  et  al.  1991;  1992). 

ARTMAPs  can  learn  arbitrary  analog  or  binary  mappings  l)e- 
tween  learned  categories  of  one  feature  spatx*  (e.g.,  visual 
features)  to  learned  eategories  of  another  feature  space  (e.g., 
auditory  features).  They  perform  well  in  benchmark  studies 
against  alternative  machine  learning,  genetic  algorithm,  or 
neural  network  models.  This  may  be  because  the  Adaptive 
Resonance  Theory  modules  that  go  into  ahtmaps  were  derived 
from  a  study  of  brain  data  (Grossberg  1987;  1988).  In  particular, 
ARTMAPS  can  autonomously  learn,  categorize,  and  make  predic¬ 
tions  about: 

1.  Rare  events:  A  successful  autonomous  agent  must  Ih'  able 
to  learn  about  rare  events  that  have  important  consequences 
even  if  those  rare  events  are  similar  to  a  surrounding  cloud  of 
frequent  events  that  have  different  consequences.  Fast  leamini’ 
is  needed  to  pick  up  a  rare  event  on  the  fly. 

2.  Large  nonstationary  data  bases:  Rare  events  typically  oc¬ 
cur  in  a  noustatiunary  environment  whose  event  statistics  may 
change  rapidly  and  unexpectedly  through  time,  artmap  con¬ 
tains  a  self-stabilizing  memory  that  permits  accumulating 
knowledge  to  be  stored  reliably  in  response  to  arbitrarily  many 
events  in  a  nonstationary  environment  under  incremental  learn¬ 
ing  conditions  until  the  algorithm's  full  memory  capacity,  which 
can  be  chosen  arbitrarily  large,  is  exhausted. 

3.  Morphologically  variable  types  of  events:  In  many  envi¬ 
ronments,  some  information,  including  rulelike  inferences,  is 
coarsely  defined,  whereas  other  information  is  precisely  charac¬ 
terized.  ARTMAP  is  able  to  adjust  its  scale  of  generalization 
automatically  to  match  the  morphological  variability  of  the  data. 
It  embodies  a  Minimax  Learning  Rule  that  jointly  minimizes 
predictive  error  and  maximizes  generalization  using  only  infor¬ 
mation  that  is  locally  available  under  incremental  learning 
conditions  in  a  nonstationary  environment. 

4.  Many-to-one  and  one-to-many  relationships:  Many-to-onc 
learning  takes  two  forms:  categorization  and  naming.  For  exam¬ 
ple,  during  the  categorization  of  printed  letter  fonts,  many 
similar  exemplars  of  the  same  printed  letter  may  establish  a 
single  recognition  category.  All  categories  that  represent  the 
same  letter  may  be  associatively  mapped  into  the  letter  name  or 
prediction.  This  is  a  second  many-to-one  map  arising  for  cul¬ 
tural,  not  visual,  reasons. 

One-to-many  learning  is  used  to  build  up  exjjcrt  knowledge 
alrout  an  object  or  event.  A  single  visual  image  of  a  particular 
animal  may,  for  example,  lead  to  learning  that  predicts  animal, 
dog,  Iwaglc,  and  my  dog  “Rover."  In  many  learning  algorithms 
the  attempt  to  learn  more  than  one  prediction  alwint  an  e\ent 
leads  to  unselective  forgetting  of  previously  learned  predictions 
for  the  same  reason  that  these  algorithms  beeonu-  unstable  in 
resjwn.se  to  nonslationary  data. 

AHT.MAP  systems  exhibit  jsroperties  1-4  because  they  iiuple- 
menl  a  set  of  heuristies  (pialilalively  different  from  those  ol 
error-liased  learning  systems: 

5.  Fay  attention:  An  artmap  can  learn  top-down  esiM'el.i- 
tions  (also  eall<-<l  prototr  pes.  prinu’s.  oi  (pu'iies'  lh.it  ran  hi.is 
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the  system  to  ignore  masses  of  irrelevant  data.  These  queries 
“test  the  hypothesis"  that  is  embodied  by  the  category  as  they 
suppress  feiUures  not  in  the  prototypical  attentional  ibcus. 

6.  Hypothesis  testing  and  match-based  learning:  The  system 
actively  searches  for  recognition  categories,  or  hypotheses, 
whose  top-down  expectations  provide  an  acceptable  match  to 
bottom-up  data.  The  top-down  expectation  focuses  attention 
upon,  and  binds,  that  cluster  of  input  features  that  it  deems  to  be 
relevant. 

7.  Choose  globally  best  answer:  After  learning  self-stabilizes, 
every  input  directly  selects  the  globally  best  matching  category 
without  any  search. 

8.  Calibrate  confidence:  A  confidence  measure,  called  vig¬ 
ilance,  calibrates  how  well  an  exemplar  matches  the  prototype 
that  it  selects.  If  vigilance  is  low,  even  poorly  matching  exem¬ 
plars  can  then  be  incorporated  into  one  category,  hence  com¬ 
pression  and  generalization  are  high.  If  vigilance  is  high,  few 
exemplars  activate  the  same  category,  hence  compression  and 
generalization  are  low.  In  the  limit  of  very  high  vigilance, 
prototype  learning  reduces  to  exemplar  learning.  The  .Minimax 
Learning  Rule  adjusts  the  vigilance  parameter  just  enough  to 
initiate  hypothesis  testing  to  discover  a  better  category,  or 
hypothesis,  with  which  to  match  the  data.  In  this  way,  a  mini¬ 
mum  amount  of  generalization  is  sacrificed  to  correct  the  error. 

9.  Rule  extraction.  At  any  stage  of  learning,  a  user  can 
translate  the  state  of  an  artmap  into  an  algorithmic  set  of  if-then 
rules.  AHTMAPs  are  thus  a  new  type  of  self-organizing  produc¬ 
tion  system.  The  Minimax  Learning  Rule  determines  how 
abstract  these  rules  will  liecoine. 

10.  Properties  scale:  All  the  desirable  properties  of  artmaps 
scale  to  arbitrarily  large  problems. 

11.  Working  memory:  The  input  level  of  an  artmap  may  be  a 
working  memory  (.ii  signed  so  that  any  grouping  of  its  stored 
events  can  be  stably  learned  in  real  time.  Using  store  working 
memory  models  (Bradski  ei  al.  1992a;  1992b),  temporally  evolv¬ 
ing  rules  may  be  learned. 

12.  Temporal  synchrony:  The  first  art  articles  (Crossberg 
1976;  1978)  predicted  that  visual  cortical  codes  could  be  ex¬ 
pressed  by  synchronous  oscillations  in  which  cooperatively 
linked  cells  oscillate  in  phase  and  that  oscillations  could  be 
replaced  by  equilibrium  points  if  no  “slow"  variables,  such  as 
inhibitory  intemeurons  or  chemical  modulators,  exist.  Within 
ART,  a  synchronized  oscillation  can  occur  when  bottom-up 
feature-selective  and  top-down  expectation  signals  fuse  into  an 
attentive  resonance  that  can  support  new  learning  and  a  con¬ 
scious  perceptual  experience.  The  predicted  linkage  between 
standing  waves,  attention,  learning,  and  conscious  experience 
has  recently  attracted  much  interest  (e.g..  Crick  &  Koch  1990b). 

After  ART  was  introduced  to  analyze  data  about  attentive 
learning  and  recognition,  Crossberg  and  Mingolla  (1985a; 
1985b)  modeled  processes  of  preattentive  vision.  A  new  type  of 
bipole  cell  was  predicted  to  link  perceptual  features  coopera¬ 
tively  into  emergent  boundary  segmentations  within  a  Bound¬ 
ary  Contour  System  (BCS).  Crossberg  and  Somers  (1991;  1992) 
have  demonstrated  that  both  the  BCS  and  ART  circuits  can  link 
cells  cooperatively  into  rapidly  synchronizing  oscillations  over 
large  cellular  distances.  The  oscillation  is  not  necessary  for 
binding  per  sc,  because  features  can  be  bound  in  a  way  that 
explains  large  data  bases  using  BCS  and  art  models  in  which  no 
oscillations  occur.  The  oscillations  provide  an  extra  degree  of 
freedom  that  scales  the  amount  of  asynchrony  that  can  Ik- 
tolerated  before  the  wrong  object  parts  may  lx;  Ixmnd  together. 
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Shastri  6c  Ajjanagadde  (S6cA)  have  provided  an  impressive 
demonstration  of  the  power  of  synchronous  activation  to  handle 
the  dynamic-binding  problem  in  reflexive  reasoning;  however, 
predicate-argument  bindings  can  also  be  handled  using  the 
tensor  product  approach  advcKated  by  Smolensky  (1990).  Hal¬ 
ford  ct  al.  (1993)  have  proposed  an  analogical  reasoning  mcxlel  in 
which  an  N-place  predicate  is  represented  by  a  tensor  pioduct  of 
rank  N  +  I,  with  one  vector  representing  the  predicate  and  N 
vectors  representing  arguments.  We  therefore  have  a  situation 
in  which  two  very  different  approaches  have  similar  achieve¬ 
ments.  Synchronous  activation  can  handle  reflexive  reasoning 
and  analogical  reasoning  (Hummel  et  al.,  in  press);  tensor 
product  representations  handle  production  systems  (Dolan  6( 
Smolensky  1989)  and  memory  retrieval  (Humphreys  et  al. 
1989),  as  well  as  analogical  reasoning. 

These  approaches  may  be  competitive,  or  they  may  be  com¬ 
plementary,  so  that  synchronous  activation  models  deal  with 
reflexive  or  implicit  reasoning  and  tensor  product  models  might 
be  more  appropriate  for  reflective  or  explicit  reasoning.  Tensor 
product  representations  can  represent  certain  properties  of 
relations  that  do  not  appear  to  be  possible  for  the  synchronous 

activation  approach.  A  relation  R{a,b . n)  can  be  handled 

by  a  tensor  product  of  rank  S  +  I  (Halford  et  al.  1993).  This  not 
only  represents  the  predicate-argument  bindings  but  also  the 
interactions  within  the  structure.  For  example,  the  tensor  prod¬ 
uct  representation  of  R{a,h,c)  represents  the  influence  of  c  on 
R(a,b),  the  influence  of  b  on  R(a,c),  and  the  influence  of  a  on 
R{b,c).  The  synchronous  activation  approach  can  handle  slot- 
filler  bindings  but  it  does  not  appear  able  to  represent  these 
higher-order  relations  that  are  important  to  complex  concepts. 

It  is  possible  to  collapse  over  any  vector  in  the  tensor  product 
(Humphreys  et  al.  1989),  so  any  subset  of  the  possible  relations 
can  be  represented.  For  example,  given  that  R{a,b,c)  is  repre¬ 
sented  by  a  rank-4  tensor  product,  R{a.b),  R(b,c),  R{a,c),  and 
R{a,b,c)  can  be  processed.  Furthermore,  any  argument  can  be 
retrieved  given  the  predicate  and  remaining  arguments,  and  the 
predicate  can  be  retrieved,  given  the  arguments.  Thus,  the 
tensor  product  representation  has  the  flexibility  and  power  that 
are  characteristic  of  explicit  or  reflective  reasoning. 

There  are  interesting  correspondences  in  the  way  capacity 
limitations  arc  handled  by  the  two  models.  In  synchronous 
activation  models  the  number  of  distinct  phases  is  limited  to 
between  5  and  10,  whereas  Halford  et  al.  (1993)  propose,  after  a 
review  of  the  working  memory  literature,  that  tensor  product 
models  are  limited  to  rank  5.  They  argue  that  the  flexibility  and 
power  of  tensor  product  models  in  handling  complex  reasoning 
requires  each  argument  to  be  represented  by  a  separate  vector, 
which  has  the  status  of  a  dimension  in  that  it  provides  an 
independent  source  of  variation.  This  corresponds  to  a  distinct 
entity  in  S&A’s  terms.  The  information  represented  by  each 
dimension,  or  each  distinct  entity,  is  variable,  often  over  a  widc 
range,  but  the  numlx;r  of  dimensions,  or  entities,  is  limited  b\’ 
the  rank  of  the  tensor  product,  or  number  of  phases.  Thus  the 
imKlcIs  agree  that  the  limit  is  not  in  the  amount  of  information 
expressed  by  each  entity  or  dimension  but  in  the  number  of 
indeiK-ndent  dimensions  (entities)  represented  in  parallel. 

It  therefore  ap|K-ar.s  that  synchronous  activation  and  tensor 
pnxluet  nnxlels  have  converged  on  a  theoretical  basis  for  the 
ixmeept  of  a  chunk  (Miller  19.56),  which  is  an  independent  item 
«if  information  of  arbitrary  size  and  is  the  unit  that  best  defines 
hunuin-pr(K-cssing  limitations.  Both  explain  the  finding  that  tlu- 
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amount  of  information  that  can  be  represented  in  any  one  item 
(the  chunk  size)  is  variable  over  a  wide  range  but  the  number  of 
independent  items  (number  of  chunks)  that  can  be  represented 
in  parallel  is  very  restricted. 
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Shastri  &  Ajjanagaddc  (S&A)  provide  a  number  of  important 
contributions  to  our  understanding  of  the  psychology  of  reason¬ 
ing  and  inference  and  its  modeling  with  connectionism.  Their 
distinction  between  reflexive  and  reflective  reasoning  is  well 
made  and  in  accordance  with  a  growing  consensus  that  many 
types  of  human  inference,  particularly  those  directly  involved  in 
language  comprehension,  are  simply  too  fast  for  the  sort  of 
deliberative  processing  associated  with,  say,  solving  syllogisms. 
The  network  models  discussed  have  the  advantages  of  being 
computationally  simple  and  neurologically  plausible  and  make 
good  contact  with  other  areas  of  research  such  as  working 
memory.  On  the  whole  the  paper  makes  a  serious  and  distin¬ 
guished  contrihutioii  to  the  area  and  will,  most  likely,  be  widely 
cited. 

Some  clarifleation  of  the  manner  in  which  rules  are  acquired 
would  be  in  order,  however,  and  would  allow  the  theory  to 
develop  further.  SAcA  are  obviously  alert  to  this  and  touch  on  a 
crucial  point  when  they  state  that  they  are  considering  learning 
‘in  the  context  of  preexisting  predicates  and  concepts  where  it  is 
desired  that  the  cooccurrence  of  events  should  lead  to  the 
formation  of  appropriate  connections  between  predicate  argu¬ 
ments”  (sect.  10.6).  Recent  developments  in  the  learning  theory 
on  stimulus  equivalence  and  relational  frame  effects  bear  on  this 
issue  and  indicate  the  importance  of  quite  extensive  exposure  to 
such  cooccurrent  events.  Briefly,  these  developments  suggest 
that  large  amounts  of  bidirectional  training  across  a  number  of 
domains  are  required  by  children  before  even  the  simplest 
stimulus  equivalence  relation  can  be  acquired.  For  example, 
even  simple  symmetric-  relations  between  entities  are  typically 
not  exhibited  by  children  of  less  than  two  years  old.  It  seems 
rather  that  a  large  number  of  forward  (A  goes  with  B)  and 
backward  (B  goes  with  A)  mappings  must  be  experienced  across 
a  series  of  domains  before  domain  invariant  information  (rules) 
can  emerge.  Once  these  are  acquired,  merely  learning  that  C 
goes  with  D  is  sufficient  for  the  reverse  relation,  D  goes  with  C, 
to  be  inferred  (see  Hayes  [1991]  for  a  detailed  account  of 
relational  frame  theory).  The  key  point  is  that  rules  themselves 
are  never  acquired  directly  or  within  one  domain;  instead,  the 
invariant  information  that  instantiates  them  emerges  through¬ 
out  a  series  of  Ix-havioural  interactions  across  several  domains. 

Viewed  thus,  rule  acquisition  is  the  reverse  of  the  variable- 
binding  problem.  Variable  binding  entails  the  attachment  of 
content  with  a  rule  (or  an  abstract  structure)  for  use  in  a 
particular  situation;  rule  ac(|uisitiun  and  instantiation  involves 
the  functional  detachment  of  common  structure  from  a  set  of 
variable  contents  for  use  in  futtire  situations  with  new  content. 
In  S&A's  terms  this  means  that  examples  of  Ixtth  forward  and 
backward  pairings  helw<-<-n  pretlicafe  arguments  must  Ik.*  ex¬ 
plicitly  experienced,  in  a  variety  of  situations,  before  the  sorts  of 
inferences  they  discuss  are  \Kissibh-.  In  our  own  work  we  have 
iiKKleled  some  of  tlu-.se  ellects  and  shown  how  pr-rfornianc<‘  on 
infert-nce  tasks,  inchuling  n-asoniug  about  kinship  n-lations. 


improves  as  a  function  of  exposure  to  related  domains  (Barnes  6c 
Hampson  1992).  A  solution  to  the  acquisition  problem  in  the 
context  of  this  model  would  probably  help  solve  the  problem  of 
memorizing  facts  or  of  converting  dynamic  bindings  to  the  static 
patterns  S6cA  also  identify. 

It  would  also  be  interesting  to  see  how  easily  the  model 
extends  to  analogical  thinking.  Both  analogical  thinking  and  the 
more  typical  inferences  considered  by  S6cA  can  be  construed  as 
similar  processes  in  that  both  involve  a  match  between  domain 
invariant  information.  It  should  not  be  impossible  to  extract 
higher-order  relations  between  predicate  arguments  across 
many  pattern  sets  and  to  use  these  to  support  analogical  reason¬ 
ing  in  a  system  such  as  S6cAs. 

Finally,  an  area  that  the  field  as  a  whole  could  now  usefully 
consider  is  the  movement  from  reflective  to  reflexive  reasoning, 
which  might  be  expected  to  follow  practice  in  certain  situations. 
This  mode-shift  in  reasoning  seems  likely  given  the  assumption 
that  reflective  reasoning  involves  conscious  deliberation  and 
reflexive  reasoning  is  more  automatic,  and  the  evidence  that 
practice  generally  shifts  processing  in  the  direction  of  automat- 
icity.  Previous  accounts  of  strategy  shifts  in  reasoning  and  prob¬ 
lem  solving  have  focused  on  changes  in  the  representation  used, 
such  as  from  visual  image  to  verbal  (e.g.,  Kosslyn  et  al.  1977);  it 
might  be  more  fruitful  now  to  consider  whether  (in  addition  or  as 
an  alternative  to  these  representational  shifts)  there  is  a  shift 
from  reflective  to  reflexive  reasoning  modes  -  though  as  S6cA 
astutely  point  out,  there  may  well  be  some  situations  that  simply 
do  not  permit  reflexive  reasoning  (sect.  8.2.5). 

Not  all  reflexive  reasoning  Is  deductive 
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Shastri  6c  Ajjanagadde's  (S6cA’s)  mcxlel  is  a  well  fleshed-out 
proposal  linking  conceptual  inference  with  neural  representa¬ 
tion.  Assigning  one  phase  to  each  concept  occurrence  is  a  clever 
idea  that  is  worthy  of  further  development.  In  this  commentary, 
we  discuss  some  of  the  problems  that  remain. 

In  their  note  3,  S6cA  see  their  notion  of  reflexive  reasoning  as 
a  generalization  of  the  well-established  notion  of  automatic 
processing.  In  feet,  the  converse  would  seem  to  be  true;  S6cA 
talk  about  rapid  deductive  inference  as  if  it  were  the  only  kind 
of  reflexive,  or  automatic  (unconscious?  -  cf.  Velmans  1991), 
reasoning  people  perform.  In  reality,  it  is  just  one  of  many  kinds, 
some  of  which  are  quite  general  and  others  of  which  are  very 
specific. 

1.  Determining  a  probable  relationship  between  two  or 
more  concepts.  This  is  the  kind  of  reasoning  we  do  when  we 
interpret  novel  (unlexicalized)  nominal  compounds  such  as 
temporal  pattern  matcher.  (Downing  [1977]  has  shown  that  the 
class  of  relationships  between  elements  in  nominal  compounds 
is  large  and  unconstrained;  but  see  Levi  1978.) 

2.  Computing  the  semantic  distance  between  two  concepts, 
which  is  a  fundamental  part  of  such  automatic  reasoning  as 
lexical  disambiguation  (Chamiak  1983;  Hirst  1987;  1988;  Hirst 
6c  Charniak  1982)  and  certain  kinds  of  problem  solving  (Hendicr 
1987).  This  kind  of  reasoning  was  achievt<l  in  the  work  cited  hy 
means  of  marker  passing;  but,  notwithstanding  SicA’s  remarks 
alKiut  the  similarity  of  their  approach  to  marker  passing,  the 
c-omputation  of  semantic  disfanc-e  docs  not  sei-m  anienahle  to 
any  kind  ofpha.se  enc-oding,  for  it  relies  cruc-ially  iiiion  a  .static 
property  of  tlic  knowledge  base  -  that  the  physical  distance 
betw'een  reprcsc-nlalions  of  concc-pls  corresponds  |■e;lS(m;vlIl> 
well  to  (he  seiiiantie  distanct-. 
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3.  Elahoratix*  inference*  such  as  supplying  typical  values  for 
roles  whose  fillers  are  left  implicit  or  unspecified.  For  example, 
subjects  reading  Mary  stirred  the  coffee  showed  subsequent 
ficilitation  for  spoon  in  a  word-completion  task  (Whitney  & 
Williams-Whitney  1990).  (We  speculate  that  this  is  not  mere 
word  association;  for  example,  Mary  stirred  the  paint  would 
fix^tate  stick  but  not  spoon.  This  hypothesis  is  presently  being 
tested  in  work  in  progress  by  the  first  author  in  collaboration 
with  Michael  K.  Tanenhaus  and  Gail  Manner.)  Similarly,  Cahill 
and  Mitchell  (1987)  found  that  reading  a  passage  that  described  a 
goal  and  a  precondition  for  achieving  it  led  to  the  inference  of  a 
plan.  The  exact  conditions  under  which  elaborative  inferences 
are  made  has  been  the  subject  of  some  debate  in  the  literature 
(Dosher  &  Corbett  1982;  Lucas  et  al.  1990;  Whitney  & 
Williams-Whitney  1990);  here  we  need  only  note  that  they  do 
occur  under  at  least  some  conditions. 

4.  The  interpretation  of  direct  and  indirect  speech  acts  and  of 
discourse  repairs  and  the  recognition,  in  general,  of  intent,  as 
distinct  from  literal  meaning,  in  discourse.  Such  tasks,  when 
described  in  full  logical  detail,  are  extraordinarily  complex  (cf. 
Allen  &  Perrault  1980;  Cohen  et  al.  1990;  McBoy  1993;  McBoy  & 
Hirst  1993).  Although  such  interpretation  is  surely  based  on 
compiled  rules  rather  than  carried  out  from  first  principles  each 
time  (cf.  Cibbs  1983),  it  remains  automatic  and  not  deductive. 
More  generally,  much  expert  reasoning  is  reflexive  interpreta¬ 
tion,  involving  the  recognition  and  categorization  of  patterns  in 
the  domain  of  expertise  (see,  e.g.,  Cooke  1992,  and  the  refer¬ 
ences  cited  therein).  (Note  that  this  is  not  categorization  in  the 
sense  that  S&A  use  that  word  in  their  sect.  2.4.) 

5.  Abductwe  inference,  which  can  also  be  extremely  rapid. 
On  this,  S&A  allude  to  another  paper  of  theirs  (Ajjanagadde 
1991),  but  offer  no  details. 

Moreover,  S&A  are  not  clear  enough  about  neural  plausibility 
-  specifically,  whether  individual  neurons  or  ensembles  in  their 
representation  can  possibly  have  biological  correlates.  On  the 
one  hand,  the  imposition  of  detailed  constraints  on  connectivity 
and  firing  rate  implies  a  biological  interpretation.  On  the  other 
hand,  their  model  is  essentially  localist;  the  representation 
seems  biologically  implausible  even  when  symbolic  neurons  are 
replaced  by  localist  ensembles  late  in  the  development  (sect. 
7.3).  A  symbolic  representation  is  perfectly  acceptable  at  an 
abstract  level  of  explanation,  but  experimentation  with  timing 
parameters  makes  sense  only  if  the  representation  itself  is 
neurobiologically  consistent. 

We  believe  that  in  certain  ways  the  model  is  closer  to  marker 
passing  than  the  authors  suggest.  They  refer  to  Fahlman’s  (1979) 
original  proposal  (sect.  3)  and  to  generate-and-filter  systems  that 
“evaluate  the  relevance  of .  .  .  paths  [after  collisions]”  (note  14) 
(e.g.,  Chamiak  1983;  Hendler  1987;  Hirst  1987;  Norvig  1989). 
However,  other  marker-passing  models  have  been  proposed 
where  collisions  generate  inferences  in  first-come,  first-served 
fiishion  (e.g. ,  Martin  &  Riesbeck  1986;  Wu  1989).  Markers  carry 
variable-binding  information  rather  than  what  S&A  call  "back- 
pointers  to  the  original  and  immediate  source  of  the  marker” 
(ibid.),  and  markers  arriving  at  the  same  node  are  considered  to 
“collide”  only  if  their  bindings  match.  If  we  further  impose  a 
phase  for  each  variable,  a  very  similar  model  results. 

The  model  as  proposed  does  not  accommodate  uncertainty. 
Indeed,  the  "more  complex  messages”  (ibid.)  that  are  carried  by 
markers  are  also  partly  to  handle  probabilities  (Wu  1989).  S&A 
suggest  integrating  temporal  synchrony  with  an  earlier  eviden¬ 
tial  system  (sect.  9.2),  but  defining  a  prol>ability  distribution 
over  inferences  is  not  straightforward  when  variable  bindings 
are  permitted.  This  is  because  the  binding  hyiwtheses  them¬ 
selves  inter;ict,  lM)th  logically  (say,  by  mutual  exclusivity)  and 
statistically.  So  it  is  not  e\en  clear  what  the  functional  specifica¬ 
tion  ought  to  be. 

An  approach  that  might  therefore  help  is  the  maximum- 
entropy  distribution  as  generalized  for  hypothe.ses  involving 
arbitmrv  vari;iblc  bindings  (formulated  by  Wu  1992a;  1992b). 


Because  variaUe  bindings  make  full  probability  oomputatioa  too 
expensive,  Wu  also  gives  a  robust  approximate  method,  AME 
(approximate  maximum  entropyX  that  allows  arbitrary  subparti¬ 
tions  (ff  probabilistic  constraints  and  hypotheses  to  be  pre¬ 
selected.  How  tractable  approximations  such  as  this  could  be 
incorporated  in  a  fixed  connectionist  architecture  is  an  impor¬ 
tant  issue  for  future  research. 

S&A  compare  their  temporal-synchrony  method  for  binding 
with  the  reduced-description  approach  (sect.  9.4).  However,  thr 
comparison  is  based  on  the  encoding  of  rules  as  directed  depen¬ 
dency  graphs.  Combining  reduced  descriptions  with  rule 
^phs  is  inappropriate  because  the  object  of  reduced  descrip¬ 
tions  is  to  avoid  representing  rules  locally  (e.g..  Pollack  1988; 
1990;  Stolcke  &  Wu  1992).  Also,  contrary  to  S&A’s  statement 
that  reduced-description  approaches  "will  also  have  to  be  aug¬ 
mented  in  order  to  deal  with  noise,”  inherent  resistance  to  noise 
is  one  the  nice  properties  that  results  from  distributed 
representation. 
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Shastri  &  Ajjanagadde  (S&A)  claim  that  their  "computational 
model  takes  a  step  toward  .  .  .  resolving  the  artificial  intel¬ 
ligence  paradox,”  namely,  the  gap  between  the  ability  of  humans 
to  draw  a  variety  of  inferences  effortlessly,  spontaneously,  and 
with  remarkable  efficiency  on  the  one  hand  and  the  results  about 
the  complexity  of  reasoning  reported  by  researchers  in  artificial 
intelligence  on  the  other  hand.  This  claim  seems  to  be  too 
strong.  S&A’s  logic  has  certain  special  features.  These  features 
are  quite  remarkable  and  are  the  result  of  an  attempt  to  find  a 
class  of  formulae  which  is  as  expressive  as  possible  and  whose 
satisfiability  can  be  decided  by  the  propagation  of  rhythmic 
activity  in  parallel  time  bound  by  the  length  of  the  shortest  proof 
and  with  space  bound  by  the  size  of  the  formula.  Nevertheless, 
from  a  logic  point  of  view  the  expressive  power  of  S&A’s  system 
is  fairly  limited.  And  the  mere  fact  that  artificial  intelligence 
researchers  have  not  investigated  this  particular  logic  does  not 
imply  that  a  significant  step  toward  resolving  the  artificial 
intelligence  paradox  has  been  made. 

But  have  artificial  intelligence  researchers  really  not  investi¬ 
gated  S&A’s  logic?  Because  of  the  imposed  restrictions,  S&A’s 
system  need  not  unify  expressions  but  the  matching  operation 
suffices.  Whereas  unification  is  inherently  sequential  (Dwork  et 
al.  1984),  matching  is  known  to  be  parallelizable  in  an  optimal 
way  (Ramesh  et  al.  1989).  There  is  also  a  striking  similarity 
between  S&A’s  reasoning  mechanism  and  certain  reduction 
techniques  applied  in  automated  theorem  provers  such  as  the 
evaluation  of  isolated  connections  (Bibel  1988).  For  example,  if 
each  variable  occurring  in  the  conditions  of  a  rule  occurs  also  in 
the  conclusion  of  a  rule  then  a  query,  all  of  whose  arguments  are 
Ixnind  toc«)nstants,  can  l>c  solved  In’  evaluating  isolated  connec¬ 
tions  only  in  precisely  the  same  way  that  S&A’s  svstein  solves 
this  query.  As  shown  by  Hiilldobler  (1990)  the  evaluation  ol 
isolated  connections  can  be  applied  in  parallel.  Moreover,  if  a 
formula  is  as  restricted  as  mentioned  above  and  can  be  solved  bv 
applying  this  reduction  tecliniciueonly,  then  fhelionndsoM  tiiiu' 
and  space  are  i-omparablc  to  the  bounds  in  S&As  system.  But 
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whereas  the  reduetiun  teehniques  in  an  automated  theorem 
prover  are  applied  in  the  larger  context  of  proving  the  satis¬ 
fiability  of  an  unrestricted  first-order  formula,  S&A’s  system  is 
designed  to  show  the  satisfiability  of  a  very  special  class  of 
formulae  and,  hence,  is  more  elaborate  for  this  special  class.  If 
the  similarity  Ijetween  reduction  techniques  applied  in  auto¬ 
mated  theorem  provers  and  the  computational  model  presented 
in  this  article  holds  for  most  of  the  special  features,  then  S&As 
work  shows  that  automatcnl  theorem  provers  which  apply  these 
rc'duction  techniques  in  parallel  are  adequate  in  the  sense  that 
they  solve  simpler  problems  faster  than  more  diffieult  ones. 
Unfortunately,  the  authors  have  not  investigated  this  similarity. 

Tile  results  of  the  target  article  would  be  a  step  toward 
resolving  the  artificial  intelligenc-e  (laradox  if  commonsense 
reasoning  problems  were  expressible  in  S&A’s  logic.  The  paper 
contains  some  predications  on  this  topic  and  it  remains  to  be 
seen  whether  these  predictions  hold.  If  they  do  then  the  gap 
lietween  the  ability  of  humans  to  draw  a  variety  of  inferences  as  if 
it  were  a  reflex  and  the  results  about  the  complexity  of  reasoning 
reported  by  researchers  in  artificial  intelligence  is  not  a  paradox 
at  all.  If  problems  that  can  be  solved  effortlessly  by  humans  can 
be  expressed  in  S&A’s  logic,  then  these  problems  are  just 
simpler  than  the  problems  investigated  in  the  artificial  intel¬ 
ligence  community. 
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Shastri  &  Ajjanagaddc  (S-VA)  have  made  an  important  contribu¬ 
tion  to  the  development  of  a  connectionist  representational 
theory  that  accounts  well  for  the  fundamental  systematicity  of 
human  reasoning.  The  most  basic  contribution  of  their  work  is 
its  demonstration  that  a  connectionist-style  model  can  represent 
and  use  propositions  and,  more  generally,  structured  informa¬ 
tion.  Despite  the  current  flurry  of  interest  in  synchrony  for 
binding  within  the  neural  network  community,  comparatively 
few  modelers  have  proposed  serious  accounts  of  how  synchrony 
can  actually  perform  useful  work.  Typically,  networks  are  shown 
to  establish  synchrony  and  the  functional  significance  and  capac¬ 
ity  of  that  synchrony  is  left  to  the  imagination.  In  contrast,  S&A 
provide  an  explicit  account  of  the  representation  of  structure  via 
synchrony  in  a  connectionist-style  architecture. 

F  urther  work  on  the  use  of  synchrony  in  knowledge  represen¬ 
tation  is  needed,  and  a  number  of  important  issues  deserve 
careful  scrutiny.  We  consider  one  of  the  most  basic  issues:  the 
inherent  tradeoff  between  distributed  representations  and  sys¬ 
tematic  bindings  among  units  of  knowledge.  The  primary  advan¬ 
tage  of  a  distributed  representation  is  its  ability  to  capture 
naturally  the  similarity  structure  of  the  represented  domain 
(similar  entities  can  share  a  greater  number  of  units  in  the 
representation  than  dissimilar  entities).  The  disadvantage  is  that 
binding  systematicity  decreases  (i.c. ,  the  likelihixKl  ofa  binding 
error  increases)  with  the  extent  of  distribution,  (fonsidcr  the 
extreme  cases.  In  a  purely  localist  representation,  no  binding 
errors  are  possible.  If  there  are  N  units,  each  representing  a 
different  concept,  then  the  network  can  simultaneously  repre¬ 
sent  its  entire  v(K-abulary  of  concepts  without  any  ambiguity 
alanit  what  is  In'ing  represented.  The  other  extreme  is  the 
completelv  distributed  case,  in  which  each  ol  the  2'  binary 
patterns  possible  over  S'  units  represents  a  distinct  concept.  In 
Ibis  case,  no  two  patterns  may  t)e  supc'iiniposed  without  spuri¬ 


ously  creating  a  new  iiattcrn,  in  the  event  ol  supeqxisition, 
binding  errors  arc  inevitable.  Intermediate  degrees  of  distribu¬ 
tion  present  intermediate  likelihoods  of  binding  errors. 

The  value  of  synchrony  is  that  it  allows  a  network  to  use  a 
distributed  representation  without  being  subject  to  binding 
errors,  thereby  alleviating  the  tradeoff  between  similarity  and 
systematicity.  There  is  a  catch,  however,  which  we  term  the  one- 
level  restriction:  Synchrony  can  only  represent  element  bind¬ 
ings  at  one  level  of  abstraction  or  hierarchy  at  a  time.  That  is, 
synchrony  cannot  simultanc^ously  represent  the  binding  of  ele¬ 
ments  to  each  other  and  also  the  bindings  of  the  units  within  the 
|)atterns  representing  those  elements.  Tliis  restriction  is  evi¬ 
dent  in  S&A’s  model.  Tlie  reiiresentation  of  proirositious  is 
distributed  over  multiple  predicate  and  objcft  units  but  the 
predicates  and  objects  themselves  are  strictly  localist.  Tlie  one- 
level  restriction  implies  that  hierarchic-al  structures  will  be 
difficult  to  represent.  It  is  unclear,  for  example,  how  S&A  would 
extend  their  system  to  represent  propositions  such  as  “Jaiu- 
knows  that  Ted  gave  Mary  flowers,  ”  in  which  an  entire  projxisi- 
tion  (rather  than  a  simple  object)  is  bound  to  the  role  of  "what  is 
known.’’ 

The  one-level  restriction  has  other  important  implications  for 
S&A’s  model.  A  basic  strength  of  the  model  is  its  capacity  to  stack 
an  unlimited  number  of  predicates  on  top  of  an  object  without 
additional  cost  (i.e.,  any  number  of  predicate  units  may  fire  on  a 
given  time  slice).  This  capacity  is  critical  both  to  the  model’s 
operation  (it  is  directly  responsible  for  its  ability  to  "search”  in 
parallel  down  multiple  inference  (laths)  and  for  its  liehavioral 
predictions  (specifically,  that  many  predicates  modifying  few 
objects  should  require  less  capacity  than  few  predicates  modify¬ 
ing  many  objects).  But  S&A’s  model  can  only  stack  predicates 
because  its  representations  of  predicates  are  nonovcrlapping 
(localist).  If  S&A  adopted  a  distributed  representation  for  predi¬ 
cates  then  stacking  would  entail  sacrificing  systematicity  of 
bindings.  S&A’s  use  of  localist  predicates  is  thus  more  tlian  a 
notational  convenience;  it  is  an  integral  jiart  of  the  model’s 
architecture  with  far-reaching  implications. 

’The  one-level  restriction  does  not  imply  that  it  is  iinjiossible 
to  use  a  distributed  representation  at  more  than  one  level  of 
abstraction;  rather,  it  implies  that  if  the  lower-level  (e.g.,  predi¬ 
cate)  representation  is  distributed  then  multiple  elements  of  this 
kind  cannot  in  general  be  combined  within  a  single  time  slice. 
That  is,  S&A  could  represent  their  predicates  in  a  distributed 
fashion  but  they  would  no  longer  be  able  to  stack  them.  In  our 
own  work  (Hummel  &  Holyoak  1992;  Hummel  et  al.,  in  press) 
we  have  explored  the  use  of  synchrony  to  represent  proixrsi- 
tions.  Like  S&A’s  model,  ours  uses  synchrony  to  bind  objects  to 
case  roles  within  propositions;  but  unlike  S&As  model,  ours 
uses  a  distributed  representation  of  objects  and  predicates.  The 
benefits  of  our  representation  are  all  those  typically  associatc'd 
with  distribution  (e.g.,  similarity,  automatic  generalization, 
etc.).  The  cost  is  that  our  model  cannot  stack  predicates  in  the 
same  unlrounded  manner  as  S&A’s  model.  Rather,  it  represents 
the  binding  of  one  object  to  only  one  case  role  per  time  slice. 

We  have  come  full  circle,  returning  to  the  tradeoff  that 
originally  motivated  the  use  of  synchrony.  S&.As  motlel  and  ours 
represent  opposite  extremes  of  this  tradeoff,  only  this  time, 
synchrony  -  already  assumed  -  is  not  available  to  ea,se  our 
dilemma.  Some  degree  of  distribution  at  the  lev  el  of  predieatr-s 
seems  necessary;  thus  S&A  are  forc-ed  to  search  an  IS-A  hier¬ 
archy  to  capture  similarity  relations.  And  our  restriction  of  one 
object-to-ease-role  binding  per  time  slice  may  entail  pr<Ki-ssing 
that  is  t(H)  serial  for  the  type  of  reflexis  e  reasoning  performed  by 
.S&As  iiKxlel.  An  interesting  (juestion  eoneerns  what  eomiiro- 
inises  are  possible  between  these  extremes  (e  g.,  repeated 
sampling  of  randomly  stacked  distributed  representations).  It 
seems  that  the  tradeoff  between  distribution  aixl  systv'inatieity  is 
a  real  one.  and  synchrony  for  dynamic  binding  -  altbongb  it 
eases  the  pain  of  the  tnuleofl  -  is  not  sufli<  ient  to  make  it 
disappear  eompletely. 
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1  would  like  to  relate  this  thought-provoking  target  article  by 
Shastri  &  Ajjanagadde  (S&A)  to  the  dynamic  linking  by  synchro¬ 
nization  of  rhythmic  activity  in  neural  networks,  This  has  Ijeen 
discussed  and  simulated  fur  sensory  segmentation  (Shimizu 
ct  al.  1985;  von  der  Malsburg  &  Schneider  1986)  and  for 
knowledge-dependent  image  decomposition  and  its  resynchru- 
nization  for  interpretation  in  vision,  including  the  development 
of  alternative  hypotheses  in  a  quasi  time-sharing  processing 
mode  at  separate  phase  positions  of  the  rhythmic  activity  (Koer- 
ner  et  al.  1987;  1990).  Whereas  those  approaches  dealt  with 
signal  level  description  and  the  transition  from  signal  to  symbol 
level,  S&A’s  approach  bridges  to  a  more  elaborate  structuring  of 
the  represented  knowledge,  applying  this  rhythmic  control  as  a 
mechanism  for  easy  reasoning  in  sophisticated  knowledge  struc¬ 
tures,  dynamically  linking  just  that  part  of  the  stored  knowledge 
that  is  needed  to  solve  the  problem  posed.  This  is  the  comple¬ 
mentary  aspect  of  the  above-mentioned  approaches  (see  sect. 
2.5). 

This  contribution  bears  on  the  still  controversial  issue  of 
whether  or  not  oscillatory  phenomena  in  cortical  recordings  are 
relevant  to  an  understanding  of  cortical  processing;  Yes,  oscilla¬ 
tions  make  a  lot  of  sense  there.  Having  dealt  with  directly 
related  problems  from  a  similar  point  of  view,  I  agree  with  S&A 
in  many  respects,  but  instead  of  simply  summarizing  all  the 
points  1  agree  with,  I  will  discuss  extensions  of  the  conceptual 
design  of  the  model  that  are  needed  to  bridge  the  gap  between 
the  signal  and  the  symbol  level  approach  and  to  give  a  more 
detailed  description  of  characteristic  aspects  of  reasoning  and 
decision  making  in  brainlike  systems. 

I  strongly  question  the  statement  (sect.  3.4)  that  there  is  no 
need  for  any  central  control  or  system  clock.  S&A  offer  no 
reasonable  idea  of  how  such  tricky  structures  can  self-organize 
from  unstructured  data  to  allow  the  emergence  of  complex 
knowledge  bases  at  all  and  to  ensure  the  requisite  flexibility, 
giving  the  system  the  chance  to  modify  and  create  symbols 
based  on  persistent  subsymbolic  descriptions  (Smolensky  1988). 
In  this  respect  learning  is  not  a  problem  of  adjusting  weights 
(sects.  3.4, 10.6)  but  ofself-organizing  the  algorithmic  structure. 
How  is  one  to  resolve  conflicts  in  a  limited  time  in  large-scale 
systems  of  this  fundamentally  asynchronous  type  if  there  is  no 
helpful  demon  keeping  track  of  all  the  locally  emerging  hypoth¬ 
eses  and  setting  the  right  phase  position?  If  one  has  as  definite  a 
setup  for  one’s  problem  as  in  the  proposed  model  (with  preset¬ 
ting  of  a  definite  and  highly  unitary  structure,  preselected 
objects,  facts,  rules,  and  presetting  the  proper  phase  position  for 
each  symbolic  item  to  be  handled)  then  the  system  cannot  get 
stuck  but  will  behave  as  desired.  But  how  is  one  to  create  the 
appropriate  setup  to  make  this  approach  work  .so  smoothly?  My 
point  is  that  this  will  turn  out  to  be  at  least  an  equally  decisive 
problem  in  dealing  with  a  “real  world  problem”  like  image 
inteiqiretation. 

This  is  not  a  problem  resulting  from  simplification  that  can  be 
resolved  in  a  straightforward  way  (sects.  9,  10).  We  have  gone  the 
route  S&A  recommend  in  section  10.1,  and  implemented  the 
internal  and  external  scan  path  as  knowledge  controlled  atten¬ 
tion  mechanisms  to  decorrelate  the  parallel  (in-phase)  visual 
input  and  resynchronize  it  from  asynchronously  emerging  liK'al 
hypotheses  to  an  inereasingly  global  eonsensns,  with  autono¬ 
mously  ranking  alternative  hypothes<-s  at  different  phase  posi¬ 
tions  within  the  period  of  global  rhyfhmie  control  iiriK-esses 
((■ross  et  ;il.  IVW2;  Koeruer  &  Boheiue  1991;  Koeruer  et  al. 


1987).  Synchronization  by  associative  cooperation  and  local 
competition  as  described  in  section  7.3  will  sufike  to  do  the  job 
only  for  small-scale  problems. 

At  a  more  realistic  scale  of  both  system  and  problem  complex¬ 
ity  there  is  no  guarantee  of  smooth  convergence  to  a  consistent 
global  solution  (or  of  any  decision  at  all)  in  a  limited  time. 
Seasoning  in  such  asynchronous!!)  systems  does  not  get  trig¬ 
gered  with  well-defined  structures  in  $[)ace  and  time  but  distrib¬ 
uted  activation  seeds  (activated  local  relational  structures)  start 
locally  synchronous  oscillations  (or  better,  reverberations)  that 
have  the  aggressive  tendency  to  occupy  more  systems  resources 
to  achieve  the  activation  of  the  most  possible  representation. 
However,  with  growing  system  c-oinplexity,  this  is  a  typical  case 
of  cuinbinatorial  explosion  among  alternative  decisions. 

Exclusively  loc-al  control  is  not  a  solution  for  this  problem, 
even  if  we  take  into  account  that  several  alternative  decisions 
can  be  developed  concurrently  with  the  proposed  phase  label¬ 
ing.  The  frequency  and  phase  ixisition  of  a  locally  evolving 
oscillation  of  a  relational  structure  is  defined  by  its  size,  struc¬ 
ture,  and  the  sensory  (or  internal)  call  that  triggered  it  (if  you 
accept  the  at  least  partly  analog-type  evaluation  of  input  activity 
in  neurons  and  therefore  also  in  neuronal  oscillating  clusters). 
Hence,  there  is  not  only  the  range  of 40-60  Hzobserved  in  early 
visual  processing  (small  relational  structures),  but,  with  the 
increasing  dimension  of  the  dynamically  linked  cluster  in  this 
aggressive  competition  for  a  growing  range  of  dominance,  a 
large  variety  of  irregular  frequencies  (and  of  phase  positions 
within  these  frequencies)  emerge.  With  respect  to  neural  pro¬ 
cessing  we  expect  this  range  to  be  between  the  highest  fre¬ 
quency  of  about  40-60  Hz  (complete  matching  of  inputs  to  all 
the  requisite  eliciting  conditions  for  this  parallel  represented 
knowledge  structure)  and  the  lowest  one  (defining  the  largest 
possible  time  interval  in  which  a  partly  matched  representation 
can  self-amplify  by  synchronizing  related  representational 
structures  that  were  not  coherently  active  initially)  which  we  set 
(for  several  reasons)  to  the  4-8  Hz  of  the  hippocampal  theta 
rhythm  (Koerneretal.  1990;  1991;  submitted).  TTie  more  simple 
such  a  parallel  representation  is,  the  higher  the  probability  it 
will  already  be  activated  initially  by  the  complete  set  of  condi¬ 
tions  (inputs)  and  will  oscillate  with  the  maximum  frequency. 

Any  such  smooth  relation  between  the  relative  global  struc¬ 
ture  of  a  representation  and  its  initial  frequency  of  updating  is 
required  for  stability,  so  that  more  global  representations  with 
lower  updating  frequencies  will  have  a  chance  to  take  increased 
control  of  lower-order  representations  reverberating  at  higher 
frequencies  (thi.s  is  the  condition  for  convergence  of  the  decision 
process).  Hence  reasoning  with  dynamic  linking  should  not  be  a 
one-step  synchronization;  Several  time  scales  are  to  be  ex¬ 
pected.  The  solution  is  not  to  replace  the  definite  system  clock 
by  a  couple  of  definite  frequencies  but  to  allow  the  emergence  of 
this  almost  chaotic  variety  of  candidate  relational  structures  in 
reflexive  reasoning  (characterized  by  its  frequency  and  relative 
phase  position)  and  to  guide  it  to  a  consisting  global  solution  by 
monitoring  the  emerging  globalization  tendency  of  coherent 
activities  and  by  setting  a  (theta-rhythmiike)  adaptive  system 
clock  to  the  most  promising  phase  position.  This  thereby  forces 
the  system  to  a  globally  consistent  solution  by  focusing  the 
search  on  aspects  related  to  this  decision. 

With  reference  to  experimental  evidence  and  to  Minsky's 
(1985)  idea  of  A-  and  S-brain  we  projxjscd  the  hippocampus  as 
this  unspecific  controller  (Koemer  ct  al.  1990;  submitted).  For 
such  theta  rhythm-driven  reasoning  the  limitations  on  the  depth 
of  reasoning  do  not  appl\  (see  sect.  8.2.6). 

We  too  have  been  attracted  by  .Miller's  (1956)  7  ±  2  rule;  \sv 
accordingly  defined  the  number  of  alternative  solutions  the 
model  system  should  be  able  to  handle  concurrently  (tliereb) 
defining  short-term  inemoi)  ).  However,  we  related  this  mea¬ 
sure  to  the  time  scale  of  theta  rhythm  based  on  experimental 
filets  more  closely  winiiectcd  to  the  cognitive  ()uality  of  lU'iiral 
processing  than  to  the  observed  oscillation  in  e;irl>  \  isjon  (e  g  . 
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the  time  interval  between  feedback-controlled  saccadic  eye 
movement,  behavior-related  phenomena  in  hippocampal  theta 
and  EEC  recordings,  or  the  statistical  distribution  of  pattern 
sequence  length  in  human  communication,  etc.)- 

Hence,  although  1  agree  that  this  7  ±  2  story  may  support 
such  a  dynamic  reflexive-reasoning  scheme,  1  doubt  one  can 
directly  and  superficially  relate  observations  on  an  early  visual 
process  to  the  results  of  a  psychological  experiment  involving 
much  more  complex  processes  and  structures. 
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Shastri  &  Ajjanagadde  (S&A)  have  taken  some  important  and 
impressive  steps  toward  understanding  aspects  of  human  cogni¬ 
tive  capabilities.  Although  it  is  still  much  too  soon  to  know  how 
much  of  their  architecture  is  genuinely  explanatory  of  human 
cognition,  they  have  proposed  a  connectionist  system  that 
defines  enough  architectural  and  performance  characteristics  to 
suggest  a  broad  range  of  empirical  tests  and  to  invite  a  variety  of 
important  extensions.  Nevertheless,  the  system  models  only  an 
isolated  portion  of  cognition  -  perhaps  artificially  isolated  -  and 
we  need  to  be  clear  about  what  kinds  of  evidence  would  serve  as 
meaningful  corroboration  of  it. 

Aeftox/ve  versus  /effective  naaonlng.  S&A’s  presentation  of 
their  system  as  a  model  of  reflexive  reasoning  only  is  an  approach 
that  is  both  laudable  and  troubling,  it  is  laudable  because  it 
avoids  the  common  tendency  in  artificial  intelligence  work  to 
portray  systems  of  limited  capabilities,  operating  over  restricted 
domains,  as  if  they  already  capture  the  essence  of  some  of  the 
least  understood,  and  most  general,  areas  of  cognition  (McDer¬ 
mott  [1981]  exposes  this  tendency  well).  For  example,  S&A  have 
wisely  refrained  from  making  any  claims  with  regard  to  the 
nature  of  conscious  deliberation,  which  they  consider  to  be 
characteristic  of  reflective  reasoning. 

S&A’s  approach  is  also  troubling,  however,  because  it  serves 
ail  too  well  to  isolate  the  model  from  criticisms  of  its  limitations 
and  deflects  many  of  the  questions  that  are  most  in  need  of 
answers.  To  any  question  of  the  form,  “Why  can’t  this  system 
display  characteristic  X  which  is  clearly  present  in  human 
reasoning?”  it  can  be  answered  that  X  is  characteristic  of  reflec¬ 
tive  rather  than  reflexive  reasoning. 

Thus,  the  most  pressing  need  for  empirical  work  related  to 
this  model  is  in  corroborating  the  rcflexive/reflcctive  distinc¬ 
tion.  Given  that  this  distinction  holds  up,  it  will  be  necessary  to 
delineate  the  boundaries  of  reflexive  reasoning  in  humans  em¬ 
pirically  before  we  can  accurately  judge  the  adequacy  of  this 
model. 

Moreover,  the  isolation  of  reflexive  reasoning  in  a  model  of 
cognition  raises  as  many  hard  questions  as  it  deflects.  Granted 
that  the  distinction  lietween  reflexive  and  reflective  reasoning  is 
intuitively  appealing,  is  there  sufficient  reason  to  believe  that 
the  two  depend  on  mechanisms  and  representations  that  are 
essentially  different?  If  so,  how  do  we  account  fiir  the  apparently 
smooth  integration  of  the  two?  How  do  we  account  for  the 
instant  availability  of  the  products  of  one  for  priK-essing  by  th<‘ 
other,  or  for  the  ability  to  give  explicit  verbal  eliaracteriziitions  of 
reflexive  reasoning,  just  as  we  do  for  reflective?  Do  the  sug¬ 
gested  extensions  to  the  model  (e.g.,  function  terms  and  encod¬ 
ing  soft  and  defeasible  rules)  make  sense  for  reflexive  reasoning, 
reflective  reasoning,  or  both?  In  learning  new  rules,  what  is  the 
relationship  between  the  refleetivr’  and  rellexivi*  reasoning 
pnK-esses  and  the  representations  they  use? 


MtfMt's  mining?  As  S&A  have  pointed  out.  there  is  much  that 
the  model  does  not  yet  account  for,  even  within  the  realm  of 
reflexive  reasoning.  In  suggesting  areas  for  further  work,  it  is 
perhaps  most  useful  to  focus  on  those  in  which  a  more  cximplete 
account  would  help  the  most  to  provide  corroboration  of  the 
model.  Here  are  a  couple  of  candidates. 

Learning.  In  addition  to  providing  significant  new  constraints 
on  the  form  of  these  representations  and  a  new  source  of 
empirical  tests,  successful  learning  techniques  will  be  required 
before  the  system's  ability  to  scale  up  to  real-world  proportions 
and  generality  t-an  be  demonstrated. 

Explaining.  As  pointed  out  in  section  5  of  the  target  article, 
facts  are  retrievable  by  query  processes  but  not  rules  or  relation¬ 
ships  between  rules.  Thus,  it  remains  to  be  shown  how'  an 
explanation  of  a  reasoning  process  could  be  given  (as  in  S&A’s 
introductory  example  of  Little  Red  Riding  Hood).  (Such  an 
explanation  would  not  necessarily  fell  within  the  realm  of  reflex¬ 
ive  reasoning  processes,  but  still  the  rejiresentations  of  rules 
would  have  to  allow  for  such  an  explanation.) 

Clanical  varaua  connectfonfst  archhmetuna.  In  their  influen¬ 
tial  article,  Fodor  and  Pylyshyn  (1988a)  questioned  the  viability 
of  connectionist  architectures  as  models  of  cognition  except 
insofar  as  they  are  used  to  implement  classical  models,  that  is, 
models  that  embody  compositional  representations  and  struc¬ 
tural  sensitivity  of  processes.  At  the  time,  unfortunately,  the 
idea  of  a  classical  model  was  roughly  identified  with  completely 
general  mathematical  models  of  symbol  manipulation  such  as 
Turing  machines,  and  the  idea  of  a  connectionist  model  was 
roughly  identified  with  relatively  unstructured  (layered  or  com¬ 
petitive)  masses  of  neuronlike  units.  This  led  to  a  sense  of 
paradox  with  regard  to  connectionist  modeling:  It  was  clear  that 
a  biologically  plausible  model  had  to  be  connectionist,  and  yet  it 
was  equally  clear  that  a  connectionist  implementation  of  some¬ 
thing  like  a  Turing  machine  could  not  be  biologically  plausible. 

Models  such  as  that  proposed  by  S&A  are  beginning  to 
suggest  the  way  out  of  this  dilemma  by  showing  that  there  exists 
a  biologically  plausible  middle  ground  liased  on  more  structured 
connectionist  architectures  that  manage  to  implement  limited 
versions  of  compositionality  and  structural  sensitivity  but  fail  to 
approach  the  full  generality  of  Turing  machines.  Some  of  the 
most  interesting  psychological  work  will  probably  center  around 
the  empirical  verification  of  the  ways  in  which  humans  fall  short 
of  this  full  generality,  and  some  of  the  most  interesting  philo¬ 
sophical  issues  will  probably  focus  on  refining  our  understand¬ 
ing  of  how  such  general  mathematical  models  can  best  contrib¬ 
ute  to  our  understanding  of  intelligence. 
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We  are  forever  in  debt  to  the  field  of  artificial  intelligence  for 
what  we  have  learned  from  its  failures.  Among  the  legacy  of 
insights  from  AI  is  the  realization  that  the  understanding  of 
language  in  all  its  forms  (including  stories,  jokes,  arguments, 
and  explanations  of  events  and  actions)  requires  that  we  be  in  a 
position  to  bring  to  bear  a  x'irtiially  limitless  array  of  knowledge 
and  experience  -  instantaneously.  Researchers  in  .\1.  being  in 
the  business  of  data  manipulation,  would  (juite  naturally  exm- 
ceive  the  problem  of  “representing  our  knowledge  of  the  world 
ami  bringing  it  to  bear  on  the  understanding  of  language  as 
threefold;  (1)  Mow  do  the  data  get  into  the  system,  (2)  how  are 
they  classified  and  stored;  aiul  (.'?)  hosv  are  the\  gotten  to  and 
processerl  when  needl’d  lor  a  particni.ir  t;isk  ls;iy.  reading  and 
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answering  questions  about  a  particular  story).  (The  three  prob¬ 
lems  are  not  dealt  with  independently;  each  needs  to  be  ad¬ 
dressed  with  the  requirements  of  the  other  in  mind.) 

If  you  suppose  that  the  data  base  consists  of  stored  proposi¬ 
tions  (“facts"  and  “rules"),  and  that  there  are  tens  of  millions  of 
them,  processing  takes  awhile.  There  will  be  searches  through  a 
large  amount  of  data  and  long  chains  of  processing  and  so  the  AI 
model  becomes  unrealistic  as  a  model  of  human  language 
understanding. 

Shastri  &  Ajjanagadde  (S&rA)  have  presented  us  with  a  con- 
nectionist  solution  to  the  problem  of  how  large  numbers  of 
propositions  are  stored  (the  long-term  knowledge  base,  or 
LTKB)  and  activated  when  the  time  comes  (e.g. ,  when  a  ques¬ 
tion  is  posed  to  the  system).  The  connectionist  language  under¬ 
stander  is  much  faster  because  it  does  not  require  searches  and 
sequential  processing  steps  in  the  manner  of  AI  models.  But  at 
the  same  time,  the  S&A  model  perpetuates  a  fundamental 
assumption  of  AI  models  of  language  understanding.  They 
assume  that  “what  we  know  about  the  world"  should  be  thought 
of  as  a  set  of  (encoded)  propositions.  Let  me  call  this  assumption 
the  LTKB-assumption  (or  LTKB-A). 

I  foresee  two  problems  for  LTKB-A  as  a  mode  for  represent¬ 
ing  what  we  know  about  the  world  and  for  showing  how  we  bring 
this  knowledge  to  bear  in  understanding  language.  One  prob¬ 
lem  is  that  the  LTKB  will  contain  too  much,  and  the  other  is  that 
it  could  not  possibly  contain  enough.  It  should  be  noted  at  the 
outset  that  the  questions  being  raised  heie  are  not  directed  at 
the  S&A  model  per  se;  their  model  was  never  proposed  as  rich 
enough  to  be  applied  to  such  problems  as  story  understanding. 
Rather,  the  challenges  are  being  offered  to  call  into  question  the 
plausibilit''  of  any  model  of  story-understanding  that  is  based  on 
LTKB-A. 

The  first  problem  can  be  illustrated  with  S&A’s  Little  Red 
Riding  Hood  story.  Of  all  the  millions  of  items  in  the  LTKB  that 
have  to  do  wi:h  children,  people  in  general  and  their  behavioral 
tendencies,  people  in  relation  to  children,  wolves,  people  in 
relation  to  wolves,  children  in  relation  to  wolves,  people  in 
relation  to  children  in  relation  to  wolves,  ways  of  getting  hurt, 
and  on  and  on,  how  do  just  the  right  propositions  get  invoked 
(meaning  just  those  propositions  that  are  needed  to  make  sense 
of  the  sentence  “The  wolf  heard  some  woodcutters  nearby  and  so 
he  decided  to  wait")?  One  might  want  to  answer,  "Well,  those 
are  the  propositions  (actually,  one  of  many  possible  sets  of 
propositions)  that  will  make  sense  of  the  sentence  in  the  story." 
And  that  is  no  doubt  true.  And  we,  if  prodded,  can  come  up  with 
such  a  set  of  “sense-making  assumptions”  out  of  all  the  things  we 
know  about  the  world.  But  short  of  giving  the  connectionist 
network  a  (homuncular)  sense  of  what  it  takes  to  make  sense  of 
the  story,  how  does  it  select  just  the  right  pieces  of  background 
knowledge  needed  to  make  sense  of  the  story-sentence?  But 
even  this  is  not  the  end  of  the  problem.  In  addition  to  making 
sense  of  story  lines,  we  can  recognize  when  a  story  line  does  not 
make  sense.  Do  we  explain  this  as  the  reader  of  the  story  failing 
to  find  facts  in  the  LTKB  that  would  make  sense  of  the  story?  But 
this  will  not  do  either.  For  story  readers  can  tell  you  what  would 
have  to  be  the  case  in  order  for  the  story  to  make  sense.  And  they 
certainly  cannot  find  that  in  the  LTKB.  All  of  this  suggests  that 
story  understanding  is  in  many  respects  more  akin  to  story 
writing  than  it  is  to  fact  recalling. 

The  second  problem  is  that  we  are  capable  of  bringing  so 
much  of  our  knowledge  and  experience  to  bear  in  language 
understanding  that  we  cannot  have  all  of  that  stored  as  a  set  of 
propositions  and  rules. 

Walter  and  Jane  get  up  for  a  few  dances  at  a  wedding 
reception.  As  the\  return  to  tlieir  table,  their  friend  Jason  says, 
“You  two  were  great.  What  beautiful  dancing  partners  -  you  two 
are  just  like  Fred  and  ..."  .At  this  point,  Rosemary  interrupts 
and  complcti’s  the  sentence;  "Kthel.  "'  Not  everyone  will  get  the 
joke,  bill  it  soil  do,  a  reeonstrnetion  of  the  elements  of  the  joke  is 


that  the  dancing  couple  is  being  complimented  on  their  danc¬ 
ing.  They  are  being  compared  to  Fred  and  som^ne.  so  Fred 
and  someone  must  be  a  famous  dancing  pair  (famous  because 
their  first  names  alone  are  enough  to  identify’  them).  We  thus 
expect  the  next  name  to  be  Ginger.  But  Rosemary  interrupts 
with  “Ethel,"  thus  evoking  the  pair  Fred  and  Ethel  idertz  of  the 
T  Love  Lucy"  show.  Having  seen  Fred  and  Ethel  .Mertz  on 
television,  the  thought  of  them  as  graceful  dancers  strikes  us  as 
hilarious. 

Is  “Fred  and  Ethel  Mertz  would  be  ridiculous  as  dancers"  in 
our  LTKB?  In  retrospect,  it  can  seem  as  if  it  must  be.  But  how 
did  it  get  in  there?  Was  it  being  formulated  as  we  watched  the  “1 
Love  Lucy  Show,"  as  it  were,  getting  us  ready  to  appreciate  the 
joke  should  it  ever  be  made?  If  such  a  proposition  is  in  the 
LTKB,  it  is  hard  to  imagine  what  is  not.  For  example,  how  about 
“Fred  and  Ethel  Mertz  were  not  a  couple  on  the  Jackie  Gleason 
show"?  Is  that  in  the  LTKB?  If  it  is,  did  it  get  in  there  as  a  result 
of  watching  the  Jackie  Gleason  show  or  the  “I  Love  Lucy"  show? 
Again,  if  such  a  proposition  is  in  the  LTKB,  it  is  hard  to  imagine 
what  is  not.  The  list  of  who  was  not  on  the  Jackie  Gleason  show  is 
very  long.  But  in  conversation  someone  mistakenly  places  Fred 
and  Ethel  on  the  Jackie  Gleason  show  (a  natural  mistake;  there 
was  a  neighbor-couple  on  that  show,  too)  and  1  immediately  spot 
the  mistake. 

The  problem  does  not  just  come  up  in  story  understanding. 
Suppose  1  happen  to  be  in  a  convenience  store,  along  with 
several  other  people,  when  two  men  wearing  ski  masks  hold  up 
the  store.  The  next  day  I  am  asked  questions  by  a  police 
detective,  and  I  answer  them  as  best  1  can.  How  are  we  to  think 
about  where  the  answers  are  coming  from?  One  way  to  think 
about  this  is  to  think  of  my  witnessing  of  the  events  of  the 
robbery  as  producing  facts  for  the  LTKB.  But  what  facts? 
Everything  that  was  true  of  the  robbery  that  1  was  in  a  position  to 
know?  The  detective  asks,  “How  tall  was  the  one  with  the  green 
ski  mask?”  I  hesitate.  He  says,  “Was  he  taller  than  the  cashier?”  1 
immediately  reply  “yes."  Was  that  fact  already  encoded  in  the 
LTKB? 

One  reason  this  seems  unreasonable  is  that  if  the  answer  to 
every  possible  question  about  our  lives  that  we  can  answer  is 
stored  as  a  fact  or  rule  there  would  just  be  too  many.  Second, 
how  are  we  to  suppose  that  these  facts  get  prized  off  our 
experience?  What  sort  of  mechanism  could  take  our  experience 
in  the  convenience  store  and  produce  from  it  all  the  facts  that 
there  are  in  that  experience  (all  the  answers  to  all  the  questions 
that  I  could  answer  about  that  experience  if  asked)?  The  detec¬ 
tive,  having  spent  a  career  investigating  such  things,  knows  just 
what  to  ask.  Some  of  his  questions  may  strike  me  as  odd.  (Did 
the  one  that  did  the  talking  come  in  the  door  first  or  second?)  Are 
we  to  suppose  that  my  inexperienced  brain  is  going  to  know 
enough  to  load  my  fact  bank  with  the  answers  to  these  (and  all 
other  possible)  specialized  questions? 

What  all  of  this  suggests  is  that  we  should  not  think  of  what  we 
know  about  the  world  as  a  stored  set  of  facts,  even  facts  distrib- 
utively  coded,  which  can  be  accessed  as  needed.  Rather,  we 
should  think  of  experience  as  more  holistically  altering  the 
system  so  that  we  can  produce  such  facts  when  the  need  arises. 
But  also,  alterations  in  the  system  (from  experience)  would  have 
the  result  that  we  get  on  with  the  reading  of  a  story  when  past 
experience  supports  such  facts  as  would  hate  to  be  the  case  in 
order  for  the  story  to  make  sense. 

It  is,  of  course,  not  easy  to  think  concretely  about  how  to 
model  such  a  system.  But  then,  we  are  talking  about  the  human 
mind.  NoImkIv  said  it  would  he  easy. 

NOTE 

I.  The  juke  is  liorruweci  fruin  au  e|iis<Kle  of  the  teles  isiun  jirugram 
'Cheers." 
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Shastri  &  Ajjanagaclclc-'s  (S&A’s  target  article'  represents  a  |)o- 
tential  milestone  in  the  cs)gnitive  seienc-e/ psychology  of  human 
reasoning.  Their  projiosal  eomjiels  a  clejiarture  from  the  more 
traditional  logieist  A1  pc'rspeetive,  in  which  the  development 
and  iinplemc'ntation  of  titore-or-lc'ss  formal  calc  uli  have*  Itccit  the 
goals  (Braine  1978;  Johnson-l.aird  1983;  Johnson-Iaircl  &  Byrne 
1991;  Rips  1983;  but  see  Oaksford  &  Chater  1992a).  S&A  note  - 
from  psychological  considerations  -  that  humans  must  continu¬ 
ously  compute  rapid  systematic  inferences  ovc-r  vc-ry  large 
knowledge  bases,  but  they  alscj  note  -  from  computational 
considerations  -  that  these  inferences  must  be  of  a  limited  kind 
and  capacity.  Human  reasoning  springs,  in  their  model,  not 
from  general-purpose  deductive  machineries  but  from  the  natu¬ 
ral  dynamics  of  interacting  neural  representations.  S&A’s  ap¬ 
proach  makes  an  appropriate  rejoinder  to  Fodor  and  Pylyshyn’s 
{1988b)  recent  criticisms  of  connectionisni  (Chater  &  Oaksford 
1990).  They  have  shown  that  -  within  the  nonlogicist  Al  tradi¬ 
tions  of  connectionism,  parallel  marker-passing  architectures 
(Fahlman  1979;  1981;  Hencller  1987),  and  computational  neuro¬ 
science  (Churcbland  et  al.  1989)  -  a  productive  synthesis  of 
psychological,  computational,  and  neurobiologicat  evidence  can 
be  brought  to  bear  on  the  central  cognitive  problem  of 
reasoning. 

In  this  commentary,  we  briefly  explore  the  potential  of  S&A’s 
model  to  illuminate  issues  in  the  psychology  of  reasoning. 

First,  by  emphasising  the  need  for  computationally  tractable 
accounts  of  human  inference,  S&A  identify'  an  important  source 
of  constraint  on  psychological  models  (Oaksford  &  Chater  1992a; 
1992b).  In  cognitive  psychology,  theories  of  rea.soning  have 
concentrated  on  empirical  aderpiacy.  Constrained  lalroratory 
tasks  involving  at  most  two  or  three  premises  provide  the  data 
that  these  theories  attempt  to  explain  (see,  e.g.,  Evans  1982; 
1989;  Johnson-Laird  &  Byrne  1991),  Ultimately,  however,  psy¬ 
chological  theories  must  generalise  to  real  human  reasoning  that 
may  implicate  the  whole  of  a  person’s  world  knowledge  in  an 
inference  (Fodor  1983).  Current  reasoning  theories,  however, 
invoke  prtx'csses  that,  when  generalised  to  large  knowledge¬ 
bases,  are  computationally  intractable  (Oaksford  &  Chater 
1992a;  1992b),  Even  if  they  fully  "acx'ount”  for  the  empirical 
data,  they  could  not  be  psychologically  real.  S&A  place  the 
emphasis  in  just  the  right  place:  Realistic  theories  of  human 
reasoning  must  not  only  be  tractable,  but  tractable  using  biolog¬ 
ical  hardware.  [See  also  Tsotses;  “Analyzing  Vi.sion  at  the  Com¬ 
plexity  U'vel"  BBS  13(3)  1990.] 

Second,  S&A’s  nnxlel  appears  to  generalise  naturally  to  every¬ 
day,  defeasible  inference.  The  deductive  inh'rences  typically 
investigated  by  reasoning  researchers  are  computationally  in¬ 
tractable  by  symbolic  means,  but  everyday  tlcfeasible  inference 
is  worse:  The  application  of  a  single  rule  is  intractable  (McDer¬ 
mott  1986;  Oaksford  &  Chater  1991).  Recent  claims  that  at  least 
one  extant  theory  of  deduction  -  mental  models  (Johmson-l^aird 
&  Byrne  1991;  see  also  BBS  multiple  book  review  of  johnson- 
[,aird  &  Byrne’s  Deduction.  BBS  16(2)  199,3)  -  generalises  to 
account  for  everyday  reasoning  fouiuh'rs  on  a  ])roblein  f<»r  which 
S&A  provide  a  natural  solution  (Chater  &  Oaksford  199-3; 
Oaksford  1993).  rhe  plausible  but  tlcfeasible  conclusion,  from  “I 
turned  the  key  of  my  car  and  it  has  not  started.”  is.  'The  ignition 
is  faulty."  But  why  is  this  default  conclusion  to  be  preferred  to. 
The  engine  has  been  removed  oscinight  ?  I  he  lack  of  an 
answer  in  mental  models  iheors  suggests  th;il  the  real  problems 
of  defeasible  reasoning  :ui’  witlely  unappreciated  ((ihater  & 
Oaksford  1993,  (kirnh.nn  199.3.  (iaksford  I99.'f'.  S&.'\  show  how 


dynamic  bindings  and  type  restrictions  can  be  generalised  to 
provitlc  binding  strengths  and  tyjK-  preferences  that  can  difler- 
eiitiate  Irntween  |V)Ssible  defeasible  conclusions. 

rhird,  S&As  use  of  type  restrictions  may  explain  one  common 
bias  in  reasoning  tasks  (Evans  1989).  In  "matching  bias,"  sub¬ 
jects  tend  to  ignore  negations,  instead  matching  named  items 
(Evans  1972;  1983,  1989).  When  asked  to  construct  a  triu- 
instance  of  the  rule,  “If  there  is  a  blue  triangle  on  the  right,  then 
there  is  not  a  red  square  on  the  left,  "  they  may  place  a  blue 
triangle  on  the  right  and  a  red  scpiarc  on  the  left  (Evans  1972). 
Oaksford  and  Stenning  ( 1992)  have  shown  that  this  bias  is  due  to 
a  difficulty  in  constructing  appropriate  contrast-classes.  The 
materials  used  in  these  experiments  leave  the  intended 
contrast-class  ambiguous,  forcing  subjects  to  match.  When  tlu' 
ambiguity  is  removed,  matching  bias  disappears.  Oaksfiird  and 
Stenning  (1992)  suggest  that  type  restrictions  on  a  predicate  s 
arguments  (x)nstrain  the  contrast-classes  identified  by  a  negated 
constituent.  For  examjjle,  “He  did  not  trasel  to  Manchester  by 
train"  (italics  =  rising  intonation),  identifies  modes  of  tnin.s))or1 
as  the  appropriate  txmtrast-class  because  the  ternary  predicate 
travels  has  the  following  associated  type  restrictions:  travels 
(traveller:  .v,  destination:  y;  mode  of  trans|5ort:  ;),  Typing  is  of 
course  not  unique  to  S&A’s  proposal,  but  we  feel  that  their 
approach  is  more  likely  to  generate  exmstrained,  tractable  typ¬ 
ing  mechanisms  (perhaps  explicitly  invoking  the  notion  of 
contrast-class). 

Fourth,  S&A  divide  human  reasoning  into  two  kinds:  Beflex- 
itx‘  reasoning  is  rapid,  uncon.scious,  and  underpins  on-line 
prediction  and  explanation  of  the  world,  anaphor  resolution, 
text  elaboration,  and  so  on.  Reflective  reasoning  is  cxm.scious, 
and  invobes  external  memorx’  aitls  (pencil  and  paner)  and 
external  representational  systems  (diagrams,  pictures,  mathe¬ 
matics,  logic,  etc.).  Other  connectionist  researchers  interested 
in  reasoning  (Rumclhart  1989;  Ruinelhart  et  al.  1986)  have 
a<lv<K'ated  this  essentially  \'vgot.skyan  distinction  that  logical 
reasoning  is  a  function  of  the  internalisation  of  external  repre¬ 
sentational  systems.  [See  also  Hanson  &  Burr:  “What  Connec¬ 
tionist  Models  Learn”  BBS  1-3(3)  1990.)  This  division  leaves 
traditional  reasoning  theories  such  as  mental  logics  and  mental 
models  without  a  natural  problematic-  Those  theories  seem  mo¬ 
tivated  by  reejuirements  and  nutations  from  explicit  deductive 
reasoning  and  are  then  generalised  to  reflexive  modes  of  infer¬ 
ence  (see,  e-g-,  l<)hnson-l,aird  1983)-  In  our  view  this  is  mis¬ 
guided:  People  are  reflexive  reasoners  first;  the  mechanisms  of 
reflexive  reasoning  are  crxrpted  to  perform  deductive  inference. 
Unsurprisinglv’,  people  are  not  particularK'  good  at  the  latter. 

Two  aspects  of  our  recent  work  support  this  position.  First, 
tasks  in  which  deductive  performance  is  [loor  do  not  allow 
subjects  to  use  external  aids  like  pencil  and  paper.  If  such 
reasoning  requires  external  aids,  then  the  simple  expedient  of 
providing  them  should  improve  performance.  In  some  pilot 
work  we  supplied  subjects  with  pencil  and  paper  in  an  abstract 
version  of  Wason’s  (1966)  selection  task  and  gave  them  one 
minute  to  solve  it.  Solution  rates  were  at  around  60%  compared 
to  only  around  4%  in  the  standard  task.  Second,  Oaksford  and 
Chater  (in  press)  have  argued  that  performanei-  on  a  \  arietv  of 
conditional  reasoning  t-sks  can  be  explained  by  the  reflexivx'  use 
of  a  prcdict-and-explain  strategy  of  the  kind  that  S&.'V  imple¬ 
ment  in  their  model. 

Finally,  S&A’s  model  of  reasoning  is  at  the  right  level  to 
contact  neuropsychological  investigations  of  Irontal  lobe  lunc- 
tion.  Neuropsychological  cv  idence  constrains  iiianv  other  areas 
of  cognitive  inquiry  Iwt  not  human  reasoning  I'his  is  anomalous 
because  the  impairment  of  hvpothesis  testing  i Milner  1963)  and 
planning  (Shalliee  1982)  performance  (areas  aUo  investigated  bv 
reasoning  researchers)  in  frontal  lobe  daiu.ige  is  wi  ll  known 
l.S<'<'  also  BBS  nmitipk’  book  rev  iew  oi  .Sli.illiees  hrom  Seuro- 
psycholoiiy  to  Mental  Stnieture.  BBS  Mi  l  19))l.l  We  luive 
begun  to  use  stamkird  reasoning  t:isks  with  iront.il  p-itient'. 
It  laksiord  et  al.  I))92h).  |);itietits  w  ith  I’.irkiii'on  s  djse.iM-  i  M.il 
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loch  ct  al.  1992),  and  patients  with  closed  head  injuries 
(Oaksford  et  al.  1992a),  with  some  interesting  results,  it  may  be 
P'jssible  to  perform  computational  “lesion"  exiieriments  on 
S&A’s  model  to  see  whether  qualitatively  similar  behaviour 
results.  Two  fac'tors  make  S&A’s  model  of  particular  significance 
here.  First,  it  is  sufficiently  well  specified  to  be  implemented 
and  to  make  explicit  predictions.  This  contrasts  with  the 
Norman-Shalliee  (1985)  model.  Second,  S&A’s  reasoning  archi¬ 
tecture  makes  contact  with  “the  rest  of  ”  cognition.  Computa¬ 
tional  models  of  frontal  lobe  function  such  as  Dchaene  and 
Changeux’s  (1991),  while  using  biologically  motivated  building 
blocks,  use  architectures  that  are  tied  to  specific  tasks. 

The  range  of  implications  of  theory  or  model  is  one  guide  to  its 
potential  for  influence.  We  believe  that  S&A  have  provided  a 
technical  approach  rich  in  experimental  possibilities.  This  is  not 
to  say  that  it  has  solved  everything;  there  are  several  issues  S&A 
have  not  tackled;  They  may  make  too  much  of  functionally 
questionable  neurophysiological  results;  indeed  the  “neuro- 
biological  ”  plausibility  of  their  scheme  is  conceptual  rather  than 
factual.  Their  solution  to  the  variable-binding  problem  works 
only  for  essentially  localist  representational  schemes  (or  localist 
views  of  distributed  schemes).  Seen  as  an  ingenious  use  of  the 
time  domain  to  implement  marker-passing,  their  proposal  is  of 
course  no  more  (or  less!)  powerful  in  itself  Although  S&A 
address  the  intractability  of  reasoning  over  very  large  data  bases, 
other  computational  problems  well  known  in  Al  knowledge 
representation  remain  to  be  resolved  in  detail  (e.g.,  the  frame 
problem  may  reappear  in  the  appropriate  specification  of  typing 
categories).  Finally,  it  is  not  clear  how  the  rest  of  a  cognitive 
architecture  can  “know”  (or  “learn  ”)  how  to  interact  effectively 
with  the  particular  nodes  and  oscillations  of  one  of  S&A’s 
inference  architectures.  Despite  such  objections,  S&A’s  model 
introduces  important  new  constraints,  and  a  useful  expressive 
vocabulary,  to  the  psychology  of  reasoning. 

This  is  definitely  a  stej)  in  the  right  direction  on  the  road  to  a 
computationally  and  biologically,  as  well  as  psychologically, 
constrained  theory  of  human  reasoning. 
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Unlike  researchers  who  try  to  prove  that  symbolic  descriptions 
of  human  cognition  should  be  replaced  by  descriptions  of  neural 
mechanism,  Shastri  &  Ajjanagaddc  (S&A)  are  engaged  in  the 
scientifically  more  fruitful  enterjirise  of  relating  the  two  levels  of 
description  to  each  other.  The  particular  computation  analyzed 
in  their  article  is  forward  inference,  for  example,  to  infer  the 
proposition  owu( Mary,  BookI )  from  the  implication  fiive(x,  y,  z) 
— ►  own(y,  z)  and  the  fact  give(John,  Mary,  Bookl).  At  the 
syinlxilic  level,  such  inferences  consist  of  two  computations. 
First,  a  match  predicate  is  applied  to  verify  that  the  given  fact 
instantiates  the  antecedent  of  the  implication;  the  output  is 
the  set  of  variable  bindings  that  make  the  match  true; 
Matchlfiivcijohn,  Mary,  Bookl).  giix'fx.  y,  z)j  =  (x/Jolin. 
y/Miiry,  z/BookI  I.  Second,  the  bindings  are  used  to  substitute 
constants  for  variables  in  the  conse(|uent;  SubstUutcjxiJohn. 
y/Mary,  zJ Bookl  j  [(nvn(y.  z)l  =  tncn{Mary.  Bookl  I  The  match 
and  suhstittdt'  procedures  constitute  a  mechanism  for  identify¬ 
ing  and  [iropagating  variable  bindings. 

The  synchronicity  hypothesis  proposed  by  .A&S  claims  that 
the  brain  pc’Hornis  the  match  and  substitute  computations  by  (I ) 
encotling  variable  bindings  through  tlu‘  synchio.ious  firing  ol 
nciirou'.  (Ol  ( lusters  ol  neurons)  which  represent,  respectively. 


the  variable  and  the  constant  bound  to  it,  and  (2)  propagating 
bindings  by  linking  the  neuron  representing  variable  occur¬ 
rence.*  x'  in  the  antecedent  to  the  neuron  representing  variable 
cKcurrence*  x"  in  the  conseciuent  in  such  a  way  that  if  i'  fires  in  a 
liarticular  phase  at  time  t,  then  x"  will  fire  in  that  same  phase  at 
time  t  +  d(t).  This  mL‘chanism  searches  through  the  set  of 
implications  in  parallel  and  the  time  rerjuired  to  infer  a  particu¬ 
lar  conclusion  is  indei>endent  of  the  size  of  that  set. 

The  synchronicity  hyixothesis  generates  two  novel  psycho¬ 
logical  ideas.  First,  S&A  explain  the  limit  on  working  memory 
caixacity  as  a  consequence  of  the  number  of  temixxral  phases  the 
brain  can  keep  distinct,  the  lK*st  attempt  so  far  to  ground  this 
wc‘ll-known  cognitive  limitation  in  neural  mechanisms.  The 
novel  idea  is  embedded  in  the  implication  that  although  there  is 
a  limit  to  the  number  of  entities  that  can  be  considered  simul¬ 
taneously,  there  is  no  limit  on  the  number  of  predicates  that  can 
be  asserted  about  tliose  entities.  It  is  not  entirely  clear  how  to 
distinguish  between  entities  and  predicates,  but  this  hypothesis 
might  nevertheless  bring  some  clarity  to  the  literature  on 
working  memory  capacity  limitations. 

Second,  the  synchronicity  mechanism  implies  that  the  brain 
spreads  variable  bindings,  rather  than  activation,  through  long¬ 
term  memory.  S&A  add  the  plausible  assumption  that  the 
binding  information  is  attenuated  with  each  propagation  step; 
eventually  it  becomes  too  fuzzy  to  support  further  inferences. 
Intuitively,  this  idea  differs  substantially  both  from  the  notion  of 
spreading  activation  (where  activation  is  a  content-free  quantity) 
and  from  the  notion  of  gradual  decay  of  working  memory 
elements,  but  formal  analyses  are  needed  to  verify  that  these 
three  mechanisms  generate  different  predictions. 

S&A  also  propose  a  limit  on  the  number  of  predicate  instantia¬ 
tions  that  can  be  active  simultaneously  and  a  constraint  on  the 
syntactic  form  of  inference  rules  used  in  backward  chaining. 
These  two  proposals,  however,  are  not  derived  from  the  hypoth¬ 
esized  neural  mechanism.  Both  arc  identified  at  the  symbolic 
level  and  motivated  with  traditional  complexity  arguments. 

The  least  comprehensible  aspect  of  the  synchronicity  hypoth¬ 
esis  is  that  it  encodes  variable  bindings  in  a  relation  which  is  not 
accessible  from  inside  the  brain  >tsclf  That  neural  cluster  A  is 
firing  in  synchrony  with  neural  cluster  B  is  detectable  by  an 
outside  observer,  but  S&A  deny  there  is  any  module  in  the  brain 
that  can  detect  this  fact.  This  takes  getting  used  to.  How  can  a 
relation  which  cannot  be  accessed  from  inside  the  system  affect 
further  processing'?  How  are  the  conclusions  derived  by  the 
proposed  mechanism  made  available  to  other  cognitive  pro¬ 
cesses,  for  example,  planning  or  decision  making? 

The  proposed  mechanism  is  less  integrated  into  psychological 
theory  than  one  might  have  wished.  Even  as  they  appeal  to 
Irehavioral  data  to  support  their  case,  the  authors  deny  that  the 
dynamic  storage  they  are  describing  can  be  identified  with  the 
working  memory  studied  by  psychologists.  In  order  to  tell  their 
storv,  S&A  have  to  introduce  what  they  call  an  overt  short-term 
memory,  an  intermediate  memory,  and  an  attcntional  spotlight. 
No  neural  mechanisms  are  supplied  for  these  comixonents  and 
the  relations  between  them  and  the  synchronicity  mechanism 
are  left  unspecified.  Finally,  the  distinction  between  reflexive 
and  reflective  reasoning,  although  solidly  grounded  in  behav¬ 
ioral  data,  comes  with  some  unresolved  conceptual  questions; 
Why  are  there  two  reasoning  mechanisms?  Under  what  circum¬ 
stances  is  one  or  the  other  mechanism  applied?  If  reflexive 
rea.soning  is  so  efficient,  why  do  people  ever  resort  to  reflective 
reasoning? 

In  the  end,  the  synchronicity  hypothesis  may  or  ina\  not  w  in 
ov  er  alternative  hv  ijotheses  in  the  race  to  ai-count  for  neuropsv  - 
chological  data.  The  deeper  significance  of  S&.As  article  is  that  it 
shows  that  cognitive  science  is  ready  to  set  aside  the  Iniitless 
<lebat<*  ov  er  whether  the  inin<l  should  be  described  at  the  lev  el 
of  neural  mechanisms  or  at  the  lev  el  ol  symbolic  eoinputations 
ami  to  begin  the  dilfieidt  but  important  task  ol  speeilv  ing  how 
the  brain  carries  out  the  minds  computations. 
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Coherence  in  neural  activity  can  be  useful  to  hind  a  neural 
representation  together.  This  is  an  old  idea  (Hcbb  1949),  but  still 
a  convincing  one.  This  idea  has  been  elalmrated  again  in  recent 
experimental  and  theoretical  litei-iure  concerned  with  the 
visual  areas  of  the  cortex  (experimental  papers  are  cited  in  the 
target  article;  for  a  discussion  of  some  theoretical  issues  see 
Johannesma  et  al.  1986;  von  dcr  Malsburg  1986;  and  Falm  1986). 
It  has  turned  out  that  the  neural  networks  in  these  areas  can 
produce  (and  make  use  of?)  coherence  on  a  fine  temporal  scale 
(msec)  in  addition  to  a  coarser  cx)rrelation  of  aetivity  in  the  range 
of  tens  to  hundreds  of  millisetx)nds.  The  former  has  lx?en  called 
“event-coherence,"  the  latter  "rate-coherence.”  The  new  idea 
was  that  event-coherence  txjuld  be  used  to  bind  the  parts  (such 
as  edges  and  comers)  of  different  objects  together  while  rate- 
coherence  would  simply  indicate  that  these  different  objects 
were  presented  at  the  same  time  in  one  scene. 

Shastri  &  Ajjanagadde's  (S&A’s)  target  article  transports  this 
idea  from  the  representation  of  visual  scenes  to  the  representa¬ 
tion  of  knowledge  in  expert  systems:  Concept  nodes  are  broadly 
activated  to  represent  the  presence  of  concepts,  whereas  event- 
coherence  is  used  to  bind  concepts  to  roles  in  predicates.  In 
other  words,  the  essential  idea  is  to  use  fine  timing  or  event- 
coherence  for  variable  binding. 

This  is  a  very  appealing  idea  and  it  is  illustrated  quite  convinc¬ 
ingly  in  Figures  1  to  13.  The  idea  of  combining  this  system  with 
an  IS-A  hierarchy  is  also  convincing  and  indeed  very  useful  if  not 
even  necessary  for  any  real  application  of  this  system. 

Only  the  representation  of  so-called  long-term  facts  in  terms 
of  presynaptic  inhibition  of  inhibitory  synapses  seems  slightly 
awkward  and  implausible  from  a  neuroscientist's  point  of  view. 
One  wonders  whether  the  same  cannot  be  achieved  by  detec¬ 
ting  fine  coincidence  through  excitatory  synapses.  Another 
problem  with  the  representation  of  long-term  facts  is  the  learn¬ 
ing  of  these  facts.  During  learning  these  presynaptic  inhibitions 
must  be  formed  somehow,  so  that  axons  from  cells  representing 
“Mary,”  for  instance,  have  to  find  their  way  to  the  right  terminals 
connecting  the  right  predicate  argument  or  role  (“recipient  ”)  to 
that  instantiation  of  the  “give”  predicate  that  was  chosen  to 
represent  the  particular  fact  that  “John  gave  Bookl  to  Mary.” 

Furthermore,  one  consequence  of  this  fact  is  that  John  does 
not  own  the  book  any  more,  so  there  should  be  a  way  of 
disconnecting  the  corresponding  inhibitory  synapse  in  the  par¬ 
ticular  “Own”  predicate  that  says  that  “John  owns  Bookl." The 
problem  is  actually  even  worse,  since  simple  disconnection  does 
not  rule  out  eventually  concluding  that  "John  owns  Bookl”  from 
other  facts.  So  the  (jnestion  remains:  How  does  the  inference 
system  deal  with  negative  evidence? 

Another  problem  with  the  predicate  "give”  is  that  "John  gave 
B(K)kl  to  .Mary,"  but  later  Mary  may  give  Bookl  back  to  John. 
After  that,  who  owns  Bookl  according  to  the  inference  system? 
Probably  both  Mary  and  John.  How  can  the  .system  be  pre¬ 
vented  from  drawing  both  conclusions?  1  believe  it  would  be 
much  more  reasonable  to  store  as  long-term  facts  not  the 
propositions  about  "giving"  but  rather  the  resulting  propositions 
about  ownership.  Thus  some  of  the  inferences  should  actually  be 
drawn  belore  storage  and  then  stored  as  long-term  facts.  Inei- 
(h’litally.  the  use  of  tlu-  term  hmg-tcmi  fact  is  ;ilso  a  bit  mislead¬ 
ing.  because  it  apparently  nei'd  not  mean  a  fact  in  long-term 
memorx  but  isilber  ;i  Diet  that  is  memori/.exl  during  tin*  ;ippr<-- 
hension  ol  a  short  story,  I’he  inlerenee  system  ines<-nted  here  is 
ele;uj\  orix'uled  lo«;u(l  ausueriiig  queries  ahoiil  short  stories 
rather  th.m  <  oiisliliitinga  eonsisteiil  complete  world-model  as  in 


long-term  memory.  Thus  the  target  article  does  not  address  thi- 
problem  of  what  to  store  and  what  to  infer,  which  is  fundamental 
for  the  organization  of  large  data  Irases  as  well  as  for  the 
understanding  of  human  long-term  memory. 

Despite  these  obvious  problems,  most  of  which  are  not 
specific  for  the  idea  presented  in  the  target  article,  S&A  illus¬ 
trate  quite  cxinvincingly  how  synchrony  can  Ire  used  for  dynamic 
binding.  Toward  the  end  of  their  paper,  however,  when  it  tx)mes 
down  to  the  technical  problems  (sects.  5  &  6),  the  nice  idea  gets 
marred  with  a  numlx-'r  of  strange  and  clumsy  cx)nstructions,  in 
particular  the  “up/down  switches  ”  and  the  “predicate  banks.”  1 
wxmder  whether  one  could  perhaps  get  along  without  these 
txmstructions. 

Predicate  banks.  1  think  the  idea  of  an  IS-A  hierarchy  should 
Ih‘  extended  to  the  predicates;  There  is  a  general  “give "  and 
under  it  there  is  a  special  “give"  representing  “John  gave  B<x>kl 
to  .Mary."  Sx'stematic  relationships  as  in  Figure  6  should  lx- 
represented  between  general  predicates  (“give,”  “buy,  ”  “own, 
etc. )  and  not  between  their  special  instantiations  as  in  Figure  12. 
The  reason  for  this  is  simply  that  the  connections  between 
"give,"  "own,  ”  and  so  on  should  be  implemented  only  once 
(between  general  predicates)  and  not  between  all  the  different 
instances  of  “give,”  “own,”  “buy,”  that  may  be  represented  as 
long-term  facts.  Thus  Figure  12  should  contain  a  general  “own  " 
ellipse  between  “can  sell  ”  and  “own  ”  and  a  general  "give "  ellipse 
between  "own”  and  “give.”  Furthermore,  the  connecting  paths 
should  be  from  "can  sell  ”  (which  is  general)  via  general  “own"  to 
particular  “own,”  and  from  general  "own  ”  via  general  “give"  to 
particular  “give.  "  This  would  not  change  the  arguments  given  in 
S&A’s  paper,  it  would  only  increase  the  length  of  the  shortest 
path  by  two  steps.  Using  this  kind  of  /S-A  hierarchy  also  for  the 
predicates  would  be  a  simple  alternative  to  S&A’s  introduction 
of  “predicate  banks.” 

Up/down  switches.  I  think  these  switches  can  also  be  replaced 
by  a  more  plausible  and  perhaps  simpler  mechanism  if  one  uses 
a  distributed  representation  of  the  eonc'epts  in  terms  of  Hebbian 
cell  assemblies  (Hebb  1949;  Palm  1982;  1990)  instead  of  single 
nodes.  In  this  framework  it  is  conceivable  to  represent  an  IS-A 
hierarchy  (ofa)ncepts  or  predicates)  in  terms  of  set-tx)ntainment 
of  the  corresponding  assemblies  (sets  of  nerve  cells). 

For  definiteness  let  us  assume  concepts  that  arc  higher  in  the 
hierarchy  are  represented  as  smaller  assemblies.  Then  upward 
inference  could  be  performed  by  raising  the  average  threshold 
of  neurons  in  the  network  (Palm  1982),  thus  forcing  the  repre¬ 
sentation  to  become  sparser.  Cotuersely,  <lown«’ard  inference 
could  be  performed  b\'  lowering  the  threshold.  Furthermore, 
the  use  of  cell  assemblies  for  the  representation  of  c-oncepts 
makes  it  possible  to  represent  similarity  between  txmeepts  in 
the  degree  of  overlap  between  the  corresixjnding  assemblies. 

Another  improvement  of  the  proposed  inference  ssstem 
could  be  the  use  of  more  than  onl\'  binar>  logical  values  for  the 
certainty  of  ])ropo.sitions.  One  could  rcjiresent  the  certainty  or 
confidence  for  a  propositit)n  by  means  of  the  rate  of  firing  of  the 
corresponding  unit.  This  is  a  little  problematic  with  the  model 
proposed  in  the  target  article,  because  it  uses  phase-coherence 
with  respect  to  a  fixetl  Ireriuencs  to  I'epresent  binding.  Tlu- 
more  general  idea  to  use  es  ent-eoherenee  (\'s.  rate  colx’renee), 
as  mentioned  in  the  beginning,  does  not  have  this  problem. 

Thus  a  number  of  technical  problems  can  perhaps  lx>  solved 
an<l  the  representational  scheme  improsed  cx)nsiderabl\  by 
using  cell  assemblies  instead  of  single  units  for  the  represen¬ 
tation  of  concepts  and  evenl-eoherx'nee  insteaxl  ol  phase- 
eoherenee  lor  the  representation  ol  bindings.  I  hese  ideas  wonld 
of  course  ha\  i'  to  be  worked  out  mori‘  ;»eeuralely,  but  I  Ix'liese 
llx'y  x'ould  help  to  make  the  proposed  s\  stem  more  ameiuibU'  to 
neurobiologix;il  tbei)ri/ing  and  perhaps  (-wn  more  iiselnl  in 
practiex'. 

I  also  loiitid  the  targx'l  artiek'  ver\  usx’lii!  lor  triggering 
thoughts  on  the  praetir.il  nsi-  lor  r\rnt-robrrenrr  -  peril. ips 
even  111  tile  v  isii.il  eorles 
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Shastri  &  Ajjanagadde  (S&A)  have  found  an  interesting  way  to 
exploit  the  representational  iwtential  of  time  in  neural  network 
models.  In  most  “neural  software  engineering,  ”  a  corresijon- 
dence  is  defined  between  some  of  the  state  vectors  of  the  model 
and  interjjretations  in  an  application  domain.  The  representa¬ 
tional  power  of  a  state  is  limited  by  its  dimension;  for  example,  a 
network  of  N  binary-valued  nodes  can  represent  at  most  2''' 
different  things.  But  without  allocating  any  further  hardware 
resources,  that  representational  power  can  be  increased  to  2'''^ 
by  interpreting  length-T  temiwral  sequences  of  states  instead  of 
individual  states.  It  is  a  space-time  trade-off:  It  takes  T  times 
longer  to  represent  something  this  way,  but  2^  times  as  many 
things  are  representable. 

S&A  have  found  a  situation  in  which  this  trade-off  is  an 
impressively  good  deal.  It  is  important  to  have  the  power  to 
represent  a  great  variety  of  variable  bindings  but  most  will  never 
actually  get  represented  in  practice,  and  most  of  those  that  do 
will  not  need  to  be  represented  for  very  long.  Hence,  it  is  better 
to  spend  some  time  rebuilding  the  representational  setting  each 
time  a  binding  needs  to  be  represented  than  to  keep  lots  of  spare 
representational  capacity  on  tap. 

The  space-time  trade-off  in  this  system  is  partly  illusory, 
because  its  dynamics  is  order  T  —  1  in  the  state  variables,  where 
T  is  the  number  of  phases  in  a  fundamental  period.  This  is 
because  maintenance  of  synchrony  requires  connections  with 
time-delay  T  -  I  between  the  p-btu  nodes  representing  corre¬ 
sponding  parts  of  rule-related  predicates.  Consequently,  so  far 
as  the  dynamics  is  concerned,  a  “state"  has  N(T  -  1)  compo¬ 
nents.  Whether  temporal  synchrony  is  implemented  with 
simple  delay  lines  or  the  elaborate  mechanism  in  S&A’s  section 
7.3,  a  buffer  of  size  N(T  —  1)  has  to  be  directly  or  indirectly 
implemented  for  the  system  to  run.  These  extra  degrees  of 
freedom  can  be  thought  of  as  implemented  at  a  sulrcellular  level. 
Computer  simulations  have  to  dedicate  memory  to  them. 

Although  temporal  coincidence  plays  a  key  role  in  this  sys¬ 
tem,  the  oscillations  seem  inessential  to  its  operation.  What 
matters  is  that  fact  predicates  “observe”  whether  their  argu¬ 
ments  fire  synchronously  with  any  constants  at  least  once  during 
a  reasoning  episode,  and  that  variables  linked  by  rules  eventu¬ 
ally  fire  at  the  same  time  as  any  constant  to  which  they  may  be 
bound.  Periodic  reiteration  of  these  coincidences  seems  a  waste 
of  time.  The  onb-  important  role  of  the  oscillations  is  in  keeping 
variables  linked  by  rules  synchronised  with  each  other.  That 
way  a  constant  synchronised  with  one  is  synchronised  with  all. 
The  synchronisation  among  rule-related  variables  would  be 
maintained  by  instantaneous  propagation  of  activations,  if  only 
that  were  possible.  Instead,  it  is  achieved  (eventually)  by  delay¬ 
ing  propagation  for  nearly  one  basic  oscillation  i>eriod,  or  by 
more  elaborate  mechanisms  that  require  at  least  one  cycle  to 
take  effect.  Perhaps  there  is  a  cheaper  svay. 

Tliis  system’s  elegant  distribution  of  representations  over 
time  is  not  matched  b>’  an  elegant  distribution  of  representations 
over  iKKles.  Coandmother-cell  (or  cell  cluster)  representations 
of  constants  and  \arial)lcs  are  used  throughout.  This  may  be  just 
as  well  for  expository  pnqioses,  but  greater  efficiency  and 
potentialb  interesting  properties  may  arise  from  more  fully 
distributed  representations.  A  s<’t  ot  (.’  constants,  tor  example, 
can  be  represented  as  patterns  distributed  over  (>(log  (.’)  nodes. 
(A  sp;irser  representation  using  a  log  ('  ikhU-s,  with  a  S>  1  but 
nevertheless  o  log  (',  might  have  more  nselnl  properties.) 
Smolenskv.  I  )ol,m,  and  others  have  developed  “tensor  product 
binding  methods  th.il  nsi-  distributed  repres<Mit.itions  ol  con¬ 


stants  and  variables  (Dolan  &  Smolensky  19S9).  Unfortunately, 
these  methods  require  (a  log  CKct  log  V)  nodes  to  represent 
bindings  among  C  constants  and  V  variables.  C  and  V  refer  to  all 
constants  and  variables,  not  just  those  used  in  an  episode  of 
reasoning.  It  seems  feasible,  however,  to  distribute  the  tensor 
product  over  time,  using  a  mixture  of  the  tensor  product  binding 
and  phase  binding  approaches  (Rohwer  1993).  Titis  offers  the 
combined  advantages  of  each  system .  The  total  number  of  nodes 
required  to  represent  the  constants  and  variables  is  reduced 
from  the  grandmother-cell  system’s  0(C  +  V)  to  0(log  C  +  log 
V).  No  extra  nodes  are  needed  to  represent  the  tensor  product, 
but  some  extra  time  steps  are  needed,  as  many  as  there  are 
bindings  in  the  episode  of  reasoning.  In  addition  to  providing 
increased  efficiency,  the  distributed  representations  might  give 
such  a  system  interesting  generalisation  properties  found  in  the 
more  |)opular  neural  network  models. 
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This  is  an  interesting  model  that  is  consistent  with  several 
common  intuitions  about  human  reasoning.  The  image  of  paral¬ 
lel  chains  of  inference  unfolding  and  refining  themselves  over 
time  with  related  elements  bound  together  through  phase 
synchrony  is  appealing.  Despite  a  widely  shared  belief  that 
many  of  cur  mental  representations  have  an  intrinsically  se¬ 
quential  character  (e.g.,  our  memory  for  music),  few  models 
have  succeeded  so  well  in  using  time  as  a  representational 
device.  Unfortunately,  the  model  is  devoid  of  empirical  support. 
It  is  so  rich  in  assumptions  and  detail  that  vast  quantities  of 
confirming  data  would  be  required  for  it  to  merit  serious 
consideration.  And  the  little  that  is  known  alrout  human  leason- 
ing  makes  such  data  unlikely. 

The  chief  source  of  evidence  appealed  to  by  Shastri  &  Aj¬ 
janagadde  (S&A)  is  neurophysiological.  They  argue  vehemently 
for  the  model’s  “neural  plausibility."  But  the  data  they  depend 
on  for  this  vague  claim  are  disconnected  from  the  domain  they 
are  modeling.  The  strongest  evidence  they  muster  is  a  sugges¬ 
tion  that  “the  dynamic  binding  of  visual  features  pertaining  to  a 
single  object  may  be  realized  by  the  synchronous  activity  of  cells 
encoding  these  features”  in  the  cat  visual  cortex,  (sect.  7.1.1). 
Even  if  we  accept  this  e.idencc  at  face  value,  does  it  tell  us 
anything  at  all  alwut  how  people  reason  '?  The  binding  of  object 
features  may  depend  on  temporal  synchrony,  but  the  model 
posits  a  particular  parametrized  temporal  synchronization  pro¬ 
cess  that  binds  the  arguments  of  abstract  predicates  to  their 
fillers  and  instantiates  the  prtK'cssing  of  abstract  chains  of  infer¬ 
ence.  The  data  are  so  far  removed  from  the  domain  of  study  that 
even  S&A  admit  that  the  only  relation  is  analogical.  Whether  or 
not  the  brain  makes  use  of  temporal  synchrony  in  object  ix;rccp- 
tion  has  no  bearing  on  bow  we  reason  abstract!);  especiall)' 
because  we  can  guess  only  roughly  at  what  either  of  the  underly¬ 
ing  psychological  processes  are.  S&A  lift  a  rich  and  promising 
metaphor  (binding  througli  temporal  sx  ncbrony)  to  the  status  of 
scientific  evidence.  This  could  be  more  easily  ignored  if  it  were 
not  so  basic  to  their  argument. 

S&A  also  report  evidence  suggesting  that  their  model  allows 
them  to  prerlict  working  memor)  rapacitx.  But  main  more 
firmly  grounded  theories  alread)'  account  lor  working  memorx 
datati’.g. ,  Baddelc)-  UiSti). 
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A  model  of  human  reasoning  should  presumably  have,  first 
and  foremost,  implications  for  human  reasoning.  Accordingly, 
S&A  make  a  few  relevant  predictions  (sect.  8.2.6)  but  supiurrt- 
ing  data  or  even  suggestive  examples  are  not  provided.  Some  of 
the  model’s  predictions  seem  rather  arbitrary  and  therefore 
unlikely  to  be  confirmed,  especially  those  having  to  do  with 
restrictions  on  when  variables  can  appear  in  the  antecedent  or 
consequent  of  a  rule.  Other  predictions  are  just  misguided.  For 
example,  the  model  predicts  that  people  can  make  transitive 
inferences  using  only  a  small  number  of  relations,  but  exam¬ 
ples  of  transitive  inference  that  pose  no  difficulty  for  iicople 
can  be  constructed  easily,  even  from  esoteric  relations.  Consid¬ 
er  the  transitive  sequence  Pimplieriteenager , ,  teenapfir^)  & 
Pimplier(teenager2,  teenager^  it  .  .  .  it  Pimplierifeenager,^^ 
teenager „).  The  inference  Pimplietiteenager,,  teenager  J  can  be 
drawn  effortlessly  (even  reflexively).  People  are  terrific  at  con¬ 
structing  linear  orderings  when  the  context  clearly  calls  for  one. 

The  model's  problems  begin  with  its  failure  to  capture  aspects 
ofpeople’s  fallibility  that  much  simpler  conncctionist  models  arc- 
able  to  capture  easily.  For  example,  people  do  not  always 
respect  the  logical  principle  of  category  inclusion.  To  illustrate, 
when  evaluating  the  strength  of  an  argument  of  the  form 
“premise  statement,  therefore  conclusion  statement,"  people 
often  fail  to  take  inclusion  relations  between  premise  and  con¬ 
clusion  objects  into  account.  For  example,  they  fail  to  judge  an 
argument  such  as  “Animals  use  norepinephrine  as  a  neuro¬ 
transmitter,  therefore  reptiles  do”  as  perfectly  strong.  In  fact,  on 
average  they  judge  it  substantially  weaker  than  the  argument 
“Animals  use  norepinephrine  as  a  neurotransmitter,  therefore 
mammals  do.”  A  conncctionist  network  much  simpler  than  the 
one  described  in  the  target  article  provides  a  straightforward 
account  of  this  finding  (Sloman  1993).  People  seem  (in  their 
reflexive  state)  to  reason  less  in  accordance  with  many  of  the 
rules  of  logic  or  IS-A  hierarchies  than  they  do  with  heuristics 
that  depend  on  similarity,  metaphor,  and  the  surface  structure  of 
statements  (see,  e.g.,  Klayman  &  Ha  1987;  Lakoff  1987;  Wason 
1960). 

Examples  of  people’s  tendency  to  rely  on  similarity  over  logic 
are  found  in  demonstrations  of  violations  of  the  conjunction  rule 
of  probability.  The  conjunction  rule  states  that  because  the 
extension  of  the  conjunction  of  events  A&B  necessarily  includes 
the  event  B,  P(B)  a  P(A&B).  This  rule  is  violated  in  that,  for 
example,  people  who  are  given  a  description  of  a  man  who  is 
intelligent,  unimaginative,  compulsive,  and  generally  lifeless 
are  more  likely  to  infer  that  he  is  an  accountant  who  plays  jazz  for 
a  hobby  (A&B)  than  they  are  to  infer  that  he  simply  plays  jazz  for 
a  hobby  (B),  apparently  because  the  description  is  more  repre¬ 
sentative  of  A&B  than  it  is  of  6  alone  (Tversky  &  Kahneman 
1983).  The  point  is  not  that  an  account  of  these  particular 
phenomena  could  not  be  generated  using  the  representational 
scheme  of  the  model  described  in  the  target  article,  but  rather 
that  the  model  gives  us  no  a  priori  reason  to  expect  these  basic 
characteristics  of  human  reasoning.  The  model  serves  as  an 
existence  proof  that  a  network  of  nodes  and  links  can  use 
temporal  synchrony  to  traverse  an  inferential  dependency 
graph.  But  many  interesting  qualities  of  human  reasoning  are 
not  explained  by  such  a  graph. 

To  argue  that  such  systematic  errors  are  the  result  of  the 
intrusion  of  reflective  processes  on  a  reflexive  process  that  is 
otherwise  logical  is  just  the  opposite  of  what  we  should  expect 
from  psycho-logic.  Let  us  hope  we  can  put  more  faith  in 
conclusions  we  come  to  upon  reflection,  on  the  assumption  that 
our  quicker,  dirtier  reflexive  thinking  will  sometimes  be  wrong. 

One  common  approach  taken  to  at  least  motivate  the  empiri¬ 
cal  validity  of  an  elaborate  mcxlel  that  rests  on  many  assumptions 
is  to  show  that  it  is  able  to  account  for  some  interesting  set  of 
data.  Tlu‘  model  would  be  much  more  convincing  if  .S&A 
showed  that  it  could  in  some  sense  comprehend  the  latfle  Red 
Riding  Hood  story  with  which  they  begin  their  di.scn.ssion.  or 
c\<’n  sonic  simpler  story.  Without  even  this  kind  of  empirical 


support  to  buttress  the  model,  the  target  article  leaves  us  with 
little  other  than  a  potentially  promising  metaphor  to  descrilH- 
human  reasoning.  But  Irc-cause  the  mind  is  so  good  at  generating 
metaphors,  a  new  metaphor  is  not  something  the  study  of  the 
mind  really  needs. 


Phase  logic  is  biologically  relevant  logic 

Gary  W.  Strong 

College  of  Inloimalion  Studies,  Drexel  University,  Philadelphia,  PH  19104 
Elactronic  mall:  slrong(a,duvm.ocs.drexel.edu 

The  target  article  by  Shastri  &  Ajjanagadde  (S&A)  presents  an 
exciting  new  model  of  binding-dependent  logic  (phase  logic) 
that  is  consistent  with  some  very  Isasic  human  information- 
processing  limitations.  Their  interpretation  of  .Miller's  magic 
number  7±2  (sect.  8.2.3)  in  terms  of  binding  limitations  of  a 
synchronous  oscillatory  system  provides  a  ground-breaking  link 
between  this  well-known  limitation  of  human  cognition  and 
underlying  neural  architecture.  In  addition,  S&A  have  clarified, 
if  their  model  holds  up,  the  well-known  paradox  of  why  it  is  so 
difficult  for  novices  to  become  experts  upon  being  presented 
with  rules  derived  Irom  expert  behavior.  Until  rules  bc*come 
part  of  the  long-term  knowledge  base  (LTKB),  they  must  be 
processed  reflectively  rather  than  reflexively,  and,  in  order  for 
them  to  become  part  of  the  LTKB,  the  rules  must  have  l)een 
relevant  in  a  number  of  cases  (sect.  8.5).  Overall,  S&A’s  phase- 
logic  model  offers  an  intriguing  alternative  to  traditional  A1 
approaches  such  as  symbol  rewrite  systems,  showing  how  bio¬ 
logically  relevant  models  can  exhibit  a  systematic  correspon¬ 
dence  between  arguments  of  first-order  predicates  and  the 
appropriateness  of  argument  correspondence  in  terms  of  class 
membership.  It  is  a  shame  that  S&A  did  not  report  the  simula¬ 
tion  results  they  mention  in  section  10,  because  they  could  have 
been  more  convincing  in  their  arguments,  having  shown  how 
they  dealt  with  the  details  of  instantiating  their  model  while- 
preserving  biological  realism. 

A  small  criticism  I  have  of  the  target  article  concerns  the  way 
S&A  interpret  the  logic  circuit  of  their  fact  encoder.  Their 
circuit  for  encoding  give(John,  Susan,  x)  will  not  recognize 
give(John,  Susan,  Car7).  Their  interpretation  of  giveijohn, 
Susan,  x)  is  “John  gave  Susan  something,”  which  is  inconsistent 
with  the  closed  world  assumption  (CWA)  the  authors  assume  in 
section  4.4.  The  CWA  requires  that  a  “don’t  know"  answer  In- 
viewed  as  a  no  answer  and  it  implies  a  failure  to  recognize 
giveijohn,  Susan,  Car7).  A  proper  circuit  for  encoding  the  fact 
"John  gave  Susan  something"  is  not  the  one  S&A  illustrate  but 
one  that  makes  use  of  their  IS-A  hierarchy  by  connecting  an 
abstract  object  to  the  T-and  node  as  an  inhibitor  of  the  g-obj  line. 
A  description  of  such  an  implementation  would  clarify  their 
interpretation  of  unbound  arguments  in  phase  logic. 

I  have  a  more  substantial  criticism  of  S&A’s  claim  that  nodes 
cannot  bind  with  more  than  one  entity  at  the  same  tinu-.  For 
example,  in  section  4.8  they  claim  that  the  node  Animat  cannot 
fire  in  synchrony  with  both  Tweety  and  Sylvester  at  the  same 
time.  In  a  periodic  phase  logical  system  this  may  Ik-  a  reasonable 
claim  but  there  is  no  reason  to  require  that  phases  be  pcriiKlic. 
This  unnecessary  claim  led  S&A  (in  sect.  5.2)  to  what  I  bclicsr-  is 
an  implausible  architectural  feature,  that  of  concept  clusters, 
each  with  kj  banks  of  p-btu  nodes,  where  k,  is  the  multiple 
instantiation  constant  and  refers  to  the  number  of  dynainie 
instantiations  a  concept  can  accommodate.  The  Strong  and 
Whitehead  (1989)  simulation,  to  which  S&.\  refer,  showed  that, 
with  an  appropriate  architecture  of  spiking  neurons  such  multi¬ 
ple  bindings  are  possible.  Our  simulation  demonstrated  cselie 
activity  in  overlapping  sub.scts  of  minktdnmns  (Strong  &  White- 
head  1989,  |).  396).  The  architecture  we  used  includes,  as  basie 
processing  units,  ininicolunins  that  contain  an  c-nseuible  ol 
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neurons  that  share  inputs.  Such  sharing  also  allows  the  ensem¬ 
ble  to  achieve  indeiieiidence  from  absolute  refractory  (jeriods  of 
individual  neurons  as  well  as  from  "noisy  propagation  delays." 
Hie  latter  is  an  iinpleinentation  problem  recognized  by  S&A 
(sc*ct.  7.3)  and  handled  through  group  averaging  within 
ensembles. 

The  use  of  mutually  inhibitory  niinicoluinns  as  basic  process¬ 
ing  units  caused  the  Strung  and  Whitehead  (1989)  simulation  to 
exhibit  a  behavior  that  obviates  another  biologically  implausible 
requirement  that  S&A  claim  is  necc.ssary  for  their  model: 
Entities  used  in  argument  bindings  must  have  specific  delays 
associated  with  them  (sect.  4.3)  to  produce  a  pha.se  separation 
iR'twecn  diflerent  entities.  Whereas  a  central  control  (such  as 
the  hippocampus)  might  play  a  role  in  separating  entities  (as 
suggested  by  Eichenbaum  et  al.  1989),  S&A  have  ruled  this  out. 
Strong  and  Whitehead's  simulation  demonstrated  that  entity 
separation  can  be  achieved,  in  any  case,  without  central  control 
and  without  concept  clusters,  even  with  overlapping  node  .sets. 
Tills  can  be  seen  in  our  simulation  output,  where  the  lioundaries 
iK'twcen  pliascs  are  fairly  sharply  defined  (see  Figure  1,  which 
shows  a  more  recent  sample  of  our  simulation  output). 

In  sum,  S&A’s  model  is  a  very  important  contribution  to  the 
field  of  cognitive  seiencx*  in  helping  to  bridge  the  logic-liascd 
approaches  of  traditional  AI  and  biologically  relevant  models  of 
human  cognition.  They  have  constructed  this  bridge  with  a 
model  whose  architecture  explains  some  very  basic  limitations 
on  human  information  processing.  With  a  bit  more  attention  to 
the  details  of  actual  neural  prcKcssing  units,  however,  the  power 
of  their  model  may  lie  substantially  improved.  I  susi>ect  that 
they  encountered  this  need  in  the  simulations  they  indicate  they 
conducted. 


Figure  1  (Strong).  'Three  staffs"  of  simulation  output  during  a 
free-running  mcKle  of  operation  following  the  learning  of  thrc’e 
overlapping  patterns.  F)aeh  line  represents  tin-  output  of  one 
minieoluimi.  There  are  thrc-c'  diflc*rent  “pha.ses  "of  output  activ¬ 
ity  that  bind  diflerent  subsets  of  iniiiieolumns.  and  the  transi¬ 
tions  betwi-en  the  phases  are  fairly  well  defined.  The  Imix 
surrounds  the  sec'ond  phase,  which  has  one  miiiieolumn  in 
common  with  tlu'  first  phase.  The  phase  that  follows  the  box  lias 
one  miiiieolumn  in  eommon  with  the  Uixi'd  pbas<'. 


TtMnporal  synchrony  and  the  speed  of  visual 
processing 

Simon  J.  Thorpe 

Institut  des  Neurosciences,  Ddpartmem  Neufosciences  de  la  Visjon, 
Universili  Pierre  &  Marie  Curie,  75005  Paris,  France 
Electronic  mall:  Ihorpetaccr.jussieu.fr 

Shastri  &t  Ajjanagadde  (S&A)  have  providc-d  a  remarkable  exam¬ 
ple  of  how  ideas  in  cognitive  science  can  converge.  When  I  first 
heard  Lokendra  Shastri  dcscrilic  his  work  with  \'enkat  Aj¬ 
janagadde  on  using  temixiral  synchrony  to  tackle  the  dynamic- 
binding  problem  in  1989,  they  were  approaching  the  problem  as 
cxnnputcr  sciemtists.  At  alioul  the  same  time,  the  work  on 
oscillatory  activity  in  the  visual  cortex  was  beginning  to  cause 
<|uite  a  stir  in  the  neuroscience  community  (Eckhom  et  al.  1988, 
Gray  et  al.  1989),  but  at  that  |x>int  these  two  different  ap¬ 
proaches  ap|5c-ared  cjuite  se|)arate.  In  the  spring  of  1990,  how¬ 
ever,  a  meeting  on  tem|)oral  coding  in  Paris  provided  an  oppor¬ 
tunity  for  interdisciplinary  discussion  and  the  cross-fertilization 
of  ideas.  Clearly,  S&A  have  taken  the  interdisciplinary  challenge 
seriously  and  have  worked  hard  to  make  their  model  consistent 
with  neurophysiological  and  psychological  data. 

I  have  some  more  sircoific  points.  S&A  devote  nearly  all  of 
their  target  article  to  an  analysis  of  language  understanding  with 
an  ap{)cal  for  experimental  data  to  test  their  ideas  (sect.  8.2). 
Although  psychological  data  on  language  comprehension  arc 
available,  it  will  probably  lie  quite  difTicuit  to  test  the  neuro¬ 
physiological  plausibility  of  such  a  model  in  this  domain,  lx.‘- 
causc  so  little  is  known  about  the  neuronal  activity  during 
language  understanding  (though  see,  e.g.,  Creutzfeldt  et  al. 
1989).  S&A  do  mention  (sect.  2.5),  however,  that  similar  prob¬ 
lems  of  dynamic  binding  arise  in  vision.  Indeed,  the  connection- 
ist  model  recently  developed  by  Hummel  and  Biederman  (1992) 
also  uses  an  approach  based  on  synchrony  of  activation  to  tackle 
the  problem  of  binding  elements  during  shape  recognition.  Can 
detailed  knowledge  about  visual  system  function  be  usc'd  to  test 
the  feasibility  of  S&A's  model  of  language  understanding'? 

A  fc:w  years  ago  we  pointed  out  the  serious  computational 
problems  jX)sed  by  the  remarkable  rapidity  with  which  the 
visual  system  can  process  images  (Tliorpe  &  Imbert  1989).  We 
argued  that  processing  was  so  rapid  that  a  great  deal  of  process¬ 
ing  must  be  possible  on  the  basis  of  onl>'  one  or  at  most  two 
spikes  per  neuron.  The  argument  was  as  follows.  There  have 
been  a  number  of  reports  of  neurons  in  the  monkcx’  temporal 
lobe  with  responses  selective  for  complex  visual  stimuli  such  as 
faces.  One  of  the  most  remarkable  features  of  such  neurons  is 
that  they  typicalls  res]X)nd  only  1(X)  to  140  msec  ;ifter  stimulus 
prc.scntation.  On  the  basis  of  anatomical  studies  it  would  appear 
that  such  neurons  are  at  least  10  synapses  away  from  the 
photorccx’ptors  of  the  retina  (information  has  to  go  through 
LGN,  VI,  \^2,  and  V4  en  route),  which  implies  that  each 
processing  stage  has  only  approximately  10  msec  before  the 
information  has  to  be  forwarded  to  the  next  layer.  Since  the 
firing  rates  of  i-ortical  neurons  rarely  exceed  100  to  200  spikes 
|M‘r  second,  this  means  that  even  if  x  isual  processing  invoKcs 
<-sscntially  feed-fi)rward  processing,  much  must  be  achieved  on 
the  basis  of  oiib'  1  or  2  spikes  per  neuron. 

The  strength  of  this  argument  has  Ikhmi  considerably  en¬ 
hanced  by  some  recent  data  of  Oram  and  Perrett  (1992),  who 
looked  at  the  time  ix)urse  of  the  face  seleetix  its  of  neurons  in  the 
primate  temporal  lobe  and  re])orted  that  seleetisits  is  fulls 
present  during  the  first  5  msec  of  the  neuronal  response  esen  for 
neurons  ss  ith  onset  latencies  of  less  than  1<K)  msec.  Other  data 
on  neurons  at  earlier  stages  in  the  visual  systcTU  Irom  our  osvn 
lalMrratory  also  tound  that  selectivity  svas  typicalls  present  right 
Irom  the  sers  start  ol  the  neuronal  resixmse,  both  in  the  ease'  of 
orientation  seleelis  ity  ((^elebrini  et  al.  19t)3;  rbor|K‘  et  al.  1989) 
and  selectisity  to  sterr-oseopic  disparity  (  Thorpe  et  al.  KWll. 
Sm  li  (lat;i  imply  that  iiilorination  pisH  essing  must  be  \  ers  rapid. 
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Oram  and  Perrett  conclude  that  "the  only  way  to  account  lor  the 
rapid  discrimination  is  to  consider  a  coding  system  in  which  the 
first  spike  from  multiple  sources  is  used  to  transmit  informatioii 
between  stages  of  processing”  (p.  70). 

But  if  processing  of  even  complex  stimuli  such  as  faces  can  be 
achieved  on  the  basis  of  only  one  or  perhaps  two  spikes  per 
neuron,  what  of  S&A's  model,  in  which  several  cycles  of  syn¬ 
chronized  activity  are  required  to  allow  reasoning?  One  re¬ 
sponse  would  be  that  although  there  is  clear  evidence  for  a 
hierarchically  organized  architecture  in  the  case  of  the  visual 
system  -  cells  in  the  temporal  lobe  arc  something  like  10 
synapses  away  from  the  retina  -  the  same  is  probably  not  true  for 
the  neural  structures  involved  in  language  understanding.  Thus 
while  synchronous  firing  may  Ik‘  difficult  to  obtain  across  the* 
different  hierarchical  levels  involvc-d  in  visual  processing  (at 
least  for  the  rapid  visual  processing  that  leads  to  the  activation  of 
face-selective  neurons  in  the  temporal  lobe),  this  may  nut  he  a 
problem  for  language  processing. 

It  may  be  that  the  visual  system  does  not  actually  need 
oscillatory  activity  to  establish  the  sort  of  grouping-related 
synchrony  of  firing  required  by  S&A.  One  of  the  surprising 
results  of  recent  studies  of  visual  response  latencies  in  the  visual 
cortex  is  the  remarkable  range  of  onset  latencies  found.  Under 
the  same  stimulation  conditions,  some  visual  cortical  neurons 
will  start  firing  40  msec  after  stimulus  onset  whereas  others  have 
onset  latencies  of  over  100  msec  (Celebrinietal.  1993;  Thorpe  et 
al.  1989;  Vogels  &  Orban  1991),  with  most  cells  starting  to  fire 
with  latencies  lietween  50  and  70  msec.  This  range  of  latencies  is 
sufficiently  large  to  mean  that  synchrony  can  lie  used  to  group 
subsets  of  neurons  even  in  tbe  absence  of  oscillatory  activity.  For 
example,  one  set  of  features  could  be  grouped  if  the  relevant 
neurons  fired  around  50  msec,  whereas  another  set  of  features 
could  be  grouped  by  having  the  relevant  neurons  fire  around  60 
msec.  As  Eckhom  (1991)  and  others  have  already  pointed  out, 
stimulus-induced  synchrony  may  provide  an  alternative  way  of 
grouping  features  without  the  need  for  oscillatory  activity. 

In  conclusion,  the  use  of  stimulus  onset  induced  synchrony  in 
the  visual  system  may  allow  the  operation  of  a  feature-binding 
process  similar  to  the  one  proposed  by  Shastri  &  Ajjanagadde. 
The  major  difference  is  that  it  can  potentially  work  even  in 
feedforward  networks  and  does  not  require  the  use  of  oscillatory 
activity,  a  significant  advantage  given  the  difficulty  that  a  num¬ 
ber  of  researchers  have  had  in  demonstrating  oscillatory  re¬ 
sponses  with  static  visual  stimuli  (see,  e.g. ,  Tovee  &  Rolls  1992). 


Should  first-order  logic  be  neuraliy 
plausible? 

David  S.  Touretzky®  and  Scott  E.  Fahiman'’ 
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Part  of  the  attraction  of  nktl  (Fahiman  ct  al.  1983),  but  also  a 
source  of  difficulty,  was  that  it  conflated  a  theory  of  representa¬ 
tion  with  a  parallel  computing  architecture.  The  emphasis  on 
parallel  implementation  discouraged  people  from  looking  tiKi 
closely  at  the  representation  ideas.  .Shastri  &  Ajjanagadde  (S&A) 
apjiear  intent  on  repeating  this  mistake,  with  claims  of  neural 
plausibility  further  clouding  the  issue.  Their  representation  is 
not  very  humanlike  -  lint  notice  how  well  it  fits  the  eompiiting 
architer  ture.  Their  architecture  is  not  very  brainlike  -  but  look 
at  the  complex  inierenees  it  supports.  Unfortunately,  when 
teaserl  apart  and  critically  examined,  neitlu-r  eomponent  holds 
up  as  a  credible  proposal  ;ibout  human  cognition. 

L«'l  us  first  consider  tbe  representalioual  eoinponent.  Mu- 
iiians  laii  make  certain  kinds  ol  inleremx’s  \-ery  rapidly,  an 


obsc-rvation  that  Iras  motivated  many  proposals  over  tiu-  years, 
including  netl.  S&A  want  to  identify  human  reflexive  reason¬ 
ing  with  a  restricted  subset  of  Horn  clause  logic.  But  where  is 
their  evidence  in  support  of  this  claim?  Tbe  only  shred  of 
justification  we  can  find  is  that  they  have,  lurking  in  the  wings, 
an  elalxrrate  scheme  fur  wiring  up  an  implementation. 

Human  reasoning  dues  not  seem  to  have  much  in  common 
with  the  type  of  inference*  at  the  ewe  of  the  S&A  proposal.  S&A 
tackle  this  difficulty  in  two  ways.  In  sections  1.4  and  9.1  they 
exclude  entire  classes  of  phenomena  -  analogical  reasoning, 
episodic  memory,  imagery,  and  associative  recall  by  fast  set 
intersection  -  that  would  apiK*ar  to  Ik*  more  central  to  human 
reasoning  than  strict  logical  deduction  (1-akuff  1987;  l^koff  & 
Johnson  1980).  But  in  section  5.5  they  promise  something  a  little 
more  flexible  than  modus  ponens,  namely,  soft  rules  -  vaguely 
defined  and  unimplemcnted  -  and  defeasible  inference.  We 
also  note  a  stab  at  alxluction  in  Ajjanagadde  (1991). 

These  crude  initial  forays  into  an  area  far  more  complex  than 
logical  deduction  are  no  substitute  for  a  credible  theorx’  of 
human  informal  reasoning,  namely,  something  comparable  in 
scope  to  Collins  and  Michalski  (1989).  Thus,  the  crux  of  the  S&A 
theory  remains  a  certain  restricted,  first-order  logical  language 
put  forth  as  a  language  of  the  brain,  with  nothing  to  recommend 
it  -  as  a  representational  theory  -  over  far  more  sophisticated 
nonconnectionist  proposals. 

The  second  component  of  S&A's  proposal  is  a  supposedly 
neuraliy  plausible  implementation,  a  claim  that  falls  apart  al¬ 
most  immediately.  It  is  fine  to  propose  synchronous  firing  as  a 
binding  mechanism,  but  this  feature  cannot  scrx'c  as  the  sole 
biological  justification  for  what  turns  out  to  Ik*  a  txtmplicatcd 
parallel  computer  architecture. 

As  a  neural  model,  the  S&A  proposal  suffers  from  multiple 
fatal  flaws.  It  is  essentially  localist,  ]x>stulating  disjoint  neural 
populations  for  distinct  concepts.  Yet  if  is  also  highly  redundant, 
requiring  multiple  copies  of  any  concept  that  might  participate 
in  more  than  one  simultaneous  relationship.  These  multiple 
copies  arc  controlled  by  a  switching  mechanism  whose  wiring, 
in  terms  of  complexity  and  specificitx'  of  connections,  is  unlike 
any  neural  circuitry  described  in  (he  literature.  We  also  ques¬ 
tion  whether  a  system  that  requires  oscillation  with  up  to  ten 
separable,  stable  pha.se$  is  any  more  neuraliy  plausible  than 
naive  “computer  in  the  bead”  models.  Certainly,  nothing  like 
this  precise  time-keeping  has  ever  lieen  observed  in  brains. 

We  conclude  with  some  remarks  on  the  relation  of  this  work  to 
the  NETL  .system.  We  sec  nothing  in  S&A’s  proposal  that  cannot 
be  done  by  netl,  which  also  exhibited  some  additional  abilities 
such  as  set-intersection.  It  is  a  straightforward  operation  in 
NETL  to  include  statements  like  scared-ofy.x)  in  the  prototyiK* 
definition  for  the  preys-on{x,y)  relation,  and  to  inherit  the 
former  whenever  the  latter  is  asserted  for  specific  individuals. 
The  S&A  model  can  Ik*  extended  to  pass  fuzzx'  quantities  rather 
than  discrete  markers,  but  so  could  .netl  (Fahiman  ct  al.  1983). 

NETL  had  a  central  controller  that  told  the  knowledge  rci^re- 
sentation  network  what  to  do  on  a  cycle-by-c\cle  basis.  S&A 
observe  that  this  is  not  neuraliy  plausible  and  claim  that  a 
significant  cxnitribution  of  their  work  is  to  show  how  such  a 
syst'*m  cun  run  without  a  central  cxnitrollcr.  This  claim  is  staled 
repeatedly,  but  it  reminds  us  of  the  W'izard  of  Oz:  “Fax’  no 
attention  to  the  man  behind  the  curtain."  We  Ix-licvc  that  (he 
authors  hax’C  bidden  the  controller,  not  eliminated  it. 

It  is  true  that  the  S&A  proposal  does  not  re(|uire  cycle-by- 
cycle  cxinirol.  Instead  of  having  a  controller  step  markers  iip- 
wartl  or  ilownward  through  the  type  hierarchy.  S&.\  eix-ate  two 
distinct  parallel  networks.  One  propagates  phase-baserl  markers 
upw;ird  from  a  starting  point  and  the  other  pas.s<>s  markers 
downward.  Some  external  agency  must  still  s<-leet  wbieb  ol 
these  lu'tworks  is  (o  beaelixe.  but  once  (his  has  been  done,  the 
iuark<*r  propagates  as  far  as  possible  in  tin*  specified  direi  tion 
rhe  authors  can  lairlv  claim  that  the\  h.ise  eliminated  esele-by- 
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cycle  control  but  at  the  cost  of  further  replication  of  network 
hitfdware  and  the  loss  of  certain  useful  operations  that  require 
more  precise  control. 

In  any  case,  there  remains  a  need  for  some  agency  to  manipu' 
late  the  many  control  lines,  set  certain  nodes  oscillating  with  the 
same  phase,  query  the  network  via  the  appropriate  e  and  c  lines, 
and  so  on.  In  section  10,  S&A  suggest  that  this  is  not  done  by  a 
“controller”  but  rather  by  the  “parser.”  It  appears  that  the 
central  controller  has  been  eliminated  by  distributing  a  few 
minor  elements  and  renaming  all  the  rest. 


Dynamic-binding  theory  is  not  piausibie 
without  chaotic  osciilation 
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Dynamic  binding  of  knowledge  is  one  of  the  essential  processes 
for  both  “reflexive  reasoning”  and  “reflective  reasoning”  in  the 
human  cognitive  system.  Shastri  &  Ajjanagadde  (S&A)  deal  with 
reflexive  reasoning  in  terms  of  connectionist  models  of  dynamic 
binding.  This  approach  may  assure  a  plausible  model  of  the 
process  of  dynamic  computations.  Indeed,  S&A  propose  a 
reasonable  model  of  reflexive  reasoning.  To  justify  the  model 
biologically  and  from  the  viewpoint  of  dynamical  theory,  how¬ 
ever,  they  refer  to  the  synchronization  or  phase  locking  of 
periodic  oscillations  that  was  observed  in  the  visual  cortex  of  the 
cat  (Eckhom  et  al.  1988;  Gray  et  al.  1989)  and  monkey  (Kreiter 
&  Singer  1992).  This  commentary  is  devoted  mainly  to  the 
question:  Could  synchronized  or  phase-locked  periodic  oscilla¬ 
tions  provide  a  plausible  basis  for  the  dynamical  model  of 
reflexive  reasoning? 

In  the  Gray  et  al.  (1989)  experiments,  rapid  damping  of  both 
auto-  and  cross-correlation  functions  was  found.  There  are  two 
possible  causes  of  the  damping:  One  is  due  to  inherent  chaos, 
and  the  other  is  perturbation  by  noise.  The  latter  possibility  can 
be  rejected.  We  see  an  apparent  feature  of  the  observed  correla¬ 
tions,  namely,  time  symmetry  of  autocrirrelations  but  time 
asymmetry  of  cross-correlations  (see  Fig.  1-3  in  Gray  et  al. 
1989).  When  the  periodic  oscillation  is  perturbed  b)  noise, 
cross-correlation  between  two  such  oscillators  should  be  sym¬ 
metric  in  time  as  well  as  in  autocorrelations  because  of  the 
statistically  stationary  motion.  The  assumption  of  the  existence 
of  chaotic  oscillators,  however,  leads  us  to  a  reasonable  explana¬ 
tion  of  the  distinct  feature  of  correlations  that  a  transient  process 
accompanied  by  desynchronization  between  chaotic  oscillators 
brings  about  time  asymmetry  of  cross-correlations,  preserving 
time  symmetry  of  autocorrelations  due  to  inherent  stationary 
chaotic  motion. 

In  addition,  we  examine  whether  or  not  desynchronization 
can  be  achieved  by  noise,  since  there  is  still  a  possibility  of  the 
participation  of  noise  in  the  transient  process,  which  may  give 
rise  to  asymmetric  cross-correlations.  Desynchronization  is  due 
to  a  separation  of  corresponding  orbits  and  the  degree  of 
separation  can  be  measured  by  the  degree  of  orbital  instability 
indicating  an  exponential  separation  of  nearby  orbits.  The 
Lyapunov  t'xpoiu’iit  is  the  average  rate  of  this  separation  in  unit 
time.  Since  desynchronization  should  start  unless  all  compo¬ 
nents  of  the  Lyapunov  specfriiin  are  negative,  the  value  of  the 
nonnegative  Lyapunov  ex|)onents  determines  the  degree  of 
desynehroniz:ition.  The  eontrilaition  es|iecially  ol  the  largest 
one,  X,  will  be  dominant.  It  is  reasonable  that  the  time  necessary 
for  dl■synehroniz;lti(m  is  ol  the  same  order  as  the  inverse  ol  the 
largest  Lvapmiov  exponent.  ()(X  '),  since  X  '  is  the  time 
neeess:uv  lor  the  e  m;ignifie:ition  ol  :i  tiny  initial  separation.  II 


noise  participates  in  desynchronization,  infinite  time  is  theoret¬ 
ically  needed  for  desynchronization,  since  in  the  case  of  noise  X 
is  zero. 

Thus,  we  conclude  that  the  cause  of  the  damping  is  the 
existence  of  inherent  chaotic  oscillators.  At  the  moment  it  is 
difiicult  to  estimate  the  correct  value  of  the  orbital  separation  of 
the  neural  oscillations;  it  seems  plausible,  hovx'ever,  to  estimate 
it  as  the  order  of  one  per  one  cycle  of  oscillation.  Hence,  as  an 
order-estimation,  the  desynchronization  takes  20—25  msec. 
Taking  into  account  a  cut-oflT-frequency  of  around  100  Hz  in  the 
experiments,  the  unit  time  of  the  observed  oscillations  should 
1h-  10  msec.  Then  X  is  estimated  at  around  0.5  per  unit  time, 
which  is  a  reasonable  value  from  the  viewpoint  of  dynamical 
theory.  Thus,  the  reasoning  of  S&A  must  l)e  amended  in  its 
“biological  interpretation"  of  their  theory.  Actually,  our  prelimi¬ 
nary  numerical  simulation  of  the  chaotic  model  for  cortical 
neuro-oscillations  shows  much  faster  desynchronization  than 
the  theoretical  estimation.  In  most  cases,  a  time  less  than  one- 
half  cycle  of  oscillation  is  required. 

The  neural  (dc)synchronization  is  a  mure  rapid  process,  so  the 
synchronized  state  cannot  be  sustained  for  the  few  hundred 
milliseconds  supposed  by  S&A.  It  is  plausible  that  the  neural 
synchronization  makes  rapid  judgments  by  feature  detection 
(Cray  et  al.  1989),  or  by  initiating  cognitive  processes  (Koerner 
et  al.  1987).  Throughout  the  process  of  thinking,  including 
“reflexive  reasoning,”  a  chaotically  itinerant  motion  among  “at¬ 
tractors”  (we  call  it  “chaotic  itinerancy")  seems  much  more 
plausible,  one  that  can  generally  appear  in  systems  with  large 
degrees  of  freedom  (Davis  1990;  Ik(^a  et  al.  1989;  Kaneko  19W, 
Tsuda  1991).  In  such  itinerant  motions,  the  system  is  temporally 
expressed  as  a  “small"  system,  where  “small"  means  the  partici¬ 
pation  of  only  a  few  dominant  modes  accompanied  with  a 
number  of  inactive  modes  that  could  be  active  at  the  next  period 
of  the  process.  These  modes  can  be  activated  as  a  chaotic  mode 
by  a  large  number  of  interactive  neurons  (Freeman  1987 ;  Skarda 
&  Freeman  1987).  The  temporal  reduction  of  the  number  of 
active  modes  must  stem  from  spatial  coherency  (Freeman  1991), 
but  not  from  “phase  locking.” 

Related  to  the  alvovc  discussions,  1  would  also  like  to  comment 
on  the  possibility  that  von  der  Malsburg’s  model  for  the  cocktail 
party  effect  (von  der  Malsburg  &  Schneider  1986)  has  nothing  to 
do  with  the  neural  synchronization  observed  in  the  experi¬ 
ments.  The  cocktail  party  effect  is  more  dynamic  and  complex, 
hence  its  explanation  needs  such  a  mechanism  of  dynamic- 
information  processing  that  both  coherence  in  spaco  and  chaotic- 
itinerancy  in  time  play  a  role  in  sustaining  memories  during  a 
period  of  a  few  hundred  milliseconds  to  a  few  seconds,  and  in 
searching  and  linking  appropriate  items  in  the  LTKB  (long-term 
knowledge  base).  Here,  spatial  coherence  is  necessary  for  the 
dynamic  link  of  neural  activities  over  wide  cortical  regions, 
especially  related  to  auditory  prcx-cssing,  short-term  memory, 
and  thinking.  Chaotic-  itinerancy  creates  the  dynamic  sustaining 
of  memories  and  the  processing  of  meaning,  namely,  a  dynamic 
link  of  memory  items  (Tsuda  1991).  We  have  shown  that  a 
coupled  chaotic-  system  and  a  chaotic  neural  network,  w  ).  b  can 
exhibit  chaotic-  itinerancy,  sustain  any  information  fed  from 
outside  by  means  of  propagating  hx-al  chaotic  activities  despite 
the  elementarv  chaotic  prexx-ss  (Matsumoto  &  Tsuda  1987, 
Tsuda  1992). 

In  addition,  1  recommend  S&A  to  the  following  literature 
exme-erning  the  roles  of  neural  synchronization.  It  has  Ix-en 
hy|x>thc*sized,  for  example,  that  synchronization  of  neural  os- 
c-illations  may  |}arti(-ipat(-  in  the  pnx-c-sses  of  rapid  inlerj)ri-ta- 
tion,  image  synthesis,  relation  formation  in  knowU-dge  basv-. 
and  |X»rallel  byte-formation  in  tin-  S(-<|nential  flow  of  visual 
information  (Holden  &  Krynkov  19JH,  Ko<-rner  et  al.  1987. 
Ri-itiMH-c-k  v-t  al.  1990.  Shimizu  &  Yamagnehi  1987).  Further¬ 
more.  c-mux-rning  dynamic  fv-aturv-s  in  eonpit-d  oscilkilor  svs- 
t«-ms.  studies  by  "eonpK-d  map  lattiev-s  i  Kaneko  1989)  ami 
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“phasie  dynaiiik-s"  (Kurainutu  1991)  should  not  Ik*  uvcrk>uked. 
The  latter  concerns  mainly  a  periodic  and  synchronised  refpmc, 
and  the  funner  treats  various  kinds  of  complex  dynamic 
regimes. 
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When  I  was  a  first -year  undergraduate  psychologist  I  developed 
an  interest  in  the  idea  that  cognition  might  have  something  to  do 
with  the  brain,  and  was  promptly  dispatched  to  the  dci>artment 
of  physiology  to  learn  something  about  my  chosen  organ.  After 
hearing  a  bit  about  nerve  cells,  impulses,  and  such,  I  produced  a 
gigantic  first  essay  in  which  I  tied  up  what  1  had  learned  of 
physiology  into  an  account  of  how  the  brain  worked  in  what,  I 
was  convinced,  was  a  monumental  achievement  in  quality  as  it 
was  in  quantity.  As  it  happened,  my  unfortunate  tutor,  an 
eminent  Spanish  nociception  researcher,  had  so  queasy  a  feel¬ 
ing  about  my  contribution  that  he  did  not  fill  the  pages  with 
learned  red-ink  disputation  of  the  fine  details  of  my  proposals,  as 
1  expected,  but  simply  said  to  me  as  he  handed  it  back:  "Like  all 
psychologists,  you  have  a  scant  regard  for  the  facts!" 

So  I  can  sympathize  with  Shastri  &  Ajjanagaddc  (S&A).  Here 
they  are,  responding,  perfectly  understandably,  to  a  supra- 
threshold  signal  above  the  noisy  hubbub  of  neurobiology,  when 
this  neurobiologist  has  to  say  that  the  phenomenon  on  which 
they  base  their  ideas  probably  docs  not  exist  in  the  primate.  As 
the  primate  brain  is  the  only  one  that  we  know  is  capable  of 
systematic  reasoning,  this  may  be  a  problem  for  them.  But  1 
sympathize  even  more  with  S&A,  since  only  rather  close  inspec¬ 
tion  of  the  neuroscience  literature  reveals  this  problem;  The 
oscillophiles  cited  in  the  target  article  -  in  what  one  hopes  is  just 
a  temporary  failure  of  elementary  scholarship  -  never  remark  in 
print  on  work  which  disagrees  with  them.  And  the  problems  to 
which  they  studiously  fail  to  refer  are  serious  enough  that  they 
are  not  adequately  dealt  with  in  a  few  dismissive  remarks  in 
S&A’s  note  27. 

Basically,  the  problem  is  that  the  cat  findings  do  not  replicate 
in  primate  visual  cortex.  For  an  alternative  reading  list  on  this, 
S&A  might  have  considered  Bairet  al.  (1992),  Kiperetal.  (1991), 
Gawne  et  al.  (1991),  Young  et  al.  (1991;  1992),  as  well  as  the  two 
papers  noted  in  note  27,  which  they  dismiss  so  lightly  (Rolls 
1991;  Tovee  &  Rolls  1992).  As  far  as  I  know,  the  Young  et  al. 
(1992)  study  is  the  most  comprehensive  primate  study  to  date, 
so  1  will  briefly  review  its  contents. 

We  sought  to  replicate  the  cat  findings  in  the  monkey.  To  do 
this,  we  recorded  multiunit  activity  (MUA)  and  local  field 
potentials  (LFP)  in  areas  VI  and  MT,  and  MUA  from  the 
inferoteinporal  cortex  (IT)  of  macaques.  Recordings  in  all  areas 
were  made  under  conditions  of  stimulation  and  anaesthesia  as 
close  as  possible  to  those  in  the  cat.  In  addition,  we  recorded 
MUA  in  the  IT  of  awake  behaving  monkeys  while  the  monkeys 
performed  a  face  discrimination  task.  The  data  were  analyzed 
with  methods  taken  from  Engel  et  al.  (1990),  so  that  the  primate 
and  cat  results  could  be  compared  directly. 

In  VI,  with  drift  ng  bar  stimuli,  all  frequency  spectra  of  the 
LFPs  showed  the  greater  part  of  their  p«jwcr  to  be  concentrated 
in  the  low-fre(|ueney  components,  and  on  stimulation  l.,FP 
power  siH’ctra  showed  broad  band  increasc*s  in  amplitude  and 
not  a  shift  in  |X)wer  from  low  to  mid-fre<pu‘ncy  as  has  bc*en 
reix)rted  in  the  eat.  Indc'cd,  the  effects  were  almost  theop|x>xite 
of  those*  in  the  eat;  Stimulation  was  assex  iated  with  st;ttistie:illy 


signilic-ant  increases  in  ixrwer  i>articularly  at  the  low  frequencies 
with  a  smaller  increase  across  almost  the  entire  sjxK'trum.  This 
wide-hand  stimulus-related  increase  in  spectral  power  may  liave 
simply  reflc'cted  that  cells  near  the  elcK-trcxIe  firiHl  more  strongly 
when  stimulated  than  when  not,  and  did  so  at  a  variety  of 
frc'quencics.  The  changes  in  frequency  distributions  would  not 
provide  narrow-band  high-amplitude  field  potentials  to  which 
spike  activity  could  become  synchronized,  and  concomitantly, 
the  oscillating  MUA  responses  that  we  did  see  were  in  the  alpha 
range,  and  there  was  no  stimulus  dependence. 

In  area  MT,  all  LFP  frequency  spectra  again  showed  most 
jxiwer  to  be  concentrated  in  the  low-frequency  coin|X)nents  and 
there  were  broad-band  increa.ses  in  amplitude  on  stimulation. 
All  oscillating  resixinses  were  in  the  alpha  range  and  there  was 
little  evidence  that  they  were  stimulus  rclatc*d.  In  Ixith  VI  and 
MT,  therefore,  with  moving  stimuli  (cf.  S&A,  note  27)  that  were 
s'ery  similar  to  those  used  in  the  cat  exiK*riments.  there  was  no 
sign  of  the  cat  oscillation  phenomena.  It  may  Ik  worth  noting 
that  this  was  not  a  “finding  of  no  effect,"  which  would  nut 
distinguish  iKtwecn  insensitivity  of  the  statistical  analysis  and 
the  absence  of  the  phenomenon:  The  stimulus  effect  in  the  LFP 
analyses  was  statistically  reliable  in  all  frequency  l>ands  except 
those  centred  on  the  alpha  range,  and  the  statistical  procedures 
were  able  to  detect  oscillations  at  frequencies  different  from 
those  observed  in  the  cat. 

In  the  IT  of  anaesthetized  monkeys,  no  MUA  responses 
indicated  the  occurrence  of  oscillation.  In  monkeys  trained  to 
make  a  differential  resjxmse  to  a  small  set  of  human  faces  versus 
a  larger  set  effaces  (at  which  discrimination  they  achieved  iK'tter 
than  90%  correct  performance),  only  two  MUA  rc'cordings 
showed  oscillations  in  the  gamma  range.  One  oscillating  re¬ 
sponse  was  associated  with  stimulation  and  the  other  was  associ¬ 
ated  with  the  absence  of  stimulation. 

These  results  suggest  that  oscillating  resix)nscs  in  the  gamma- 
frequency  Ivand  are  remarkably  rare  in  conditions  very  close  to 
those  in  the  cat  studies  and  even  in  conditions  that  would  be 
thought  to  require  the  binding  of  features  into  a  representation 
coherent  enough  to  form  the  basis  on  which  the  discrimination 
decision  could  be  made.  Tlie  fact  that  such  oscillations  were  not 
stimulus  dependent  also  suggests  that  oscillations  are  not  re¬ 
quired  for  feature  binding  in  the  studied  regions  of  the  monkey 
visual  system. 

Having  used  the  methods  of  Engel  et  al.  (1990)  to  classify  the 
data  as  oscillating  or  not,  we  noticed  a  number  of  methodological 
problems  with  this  and  related  methods.  For  example,  the  cat 
researchers  forgot  to  take  account  of  the  goodness-of-fit  between 
the  Gabor  functions  (whose  parameters  were  used  to  classify  the 
responses)  and  thecorrelograms.  Obviously,  ifoncd(K*s  not  care 
how  well  a  description  fits,  then  Mrs.  Thatcher  could,  for 
example,  be  described  as  an  enthusiastic  European:  If  the 
parameters  of  a  description  are  to  be  used  to  classify  something, 
the  description  should  fit  the  described  thing  well.  We  found 
that  Type  I  error  due  to  this  factor  alone  could  vary  between  17% 
and  100%  overestimation.  Similarly,  “burst,"  “delayed  inhibi¬ 
tion,"  and  “return"  components,  which  arc  sometimes  seen  in 
correlograms  and  which  could  lx;  consistent  with  being  in  the 
chaotic  domain  and  not  only  the  oscillatory  domain,  would 
unfortunately  have  been  included  as  “oscillating  resix)nses 
according  to  the  methods  of  Engel  et  al.  (1990).  These  meth¬ 
odological  difficulties  leave  the  empirical  status  of  some  findings 
in  this  area  rather  uncertain.  For  example,  how  can  we  know 
that  the  “long  bar  experiment"  did  not  invoK<*  false  poxitixes.'^ 

It  seems  unlikely,  in  the  light  of  these  empirical  facts,  that 
stimulus-related  oscillations  could  be  a  general  pIu*nomenon, 
and  unlikely,  therefore,  that  a  ix-riixlie  temixiral  “eixh*  is  a 
general  solution  to  the  problem  of  binding  tlx*  s<*parate  features 
of  an  object,  visual  or  .semanfie,  iiifoa  eohereni  representation. 
This  is  a  nice  illustration  of  the  dangers  ol  having  one  s  psycho¬ 
logical  theory  disproved  “by  some  irrelexanf  physiological  r<*- 
se;irch"  (Broa<lbent  19.'}8).  and  I  suiipose  that  S&.\  have  two 
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options.  Either  they  take  Broadbent  seriously,  and  stop  building 
models  based,  even  loosely,  on  what’s  happening  in  neurobiol- 
ogy,  which  would  be  a  shame,  or  they  pay  a  bit  closer  attention 
next  time. 

As  for  the  rest  of  S&A’s  target  article,  it  seems  to  me  to  be 
terribly  brave,  but,  in  the  end,  just  cognitive  science. 
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Our  response  is  organized  into  five  sections.  In  section  R1 
we  respond  to  issues  concerning  the  biological  plau¬ 
sibility  of  our  model.  In  section  R2  we  discuss  questions 
about  its  cognitive/psychological  significance.  Several 
commentators  pointed  to  alternative  approaches  to  dy¬ 
namic  bindings  and  reflexive  reasoning.  We  discuss  these 
in  section  R3.  In  section  R4  we  respond  to  some  com¬ 
ments  about  learning.  The  remaining  issues  are  discussed 
in  section  R5.  In  what  follows  we  refer  to  our  model  as 
SHRUTI.l 

R1.  Biological  plausibility 

Before  responding  to  commentaries  about  the  biological 
plausibility  of  shruti  let  us  repeat  what  we  said  in  section 
1.4  of  the  target  article:  “Neural  plausibility  is  an  impor¬ 
tant  aspect  of  this  work  -  we  show  that  the  proposed 
system  can  be  realized  by  using  neurally  plausible  nodes 
and  mechanisms,  and  we  investigate  the  consequences  of 
choosing  biologically  motivated  values  of  system  parame¬ 
ters.  Needless  to  say,  what  we  describe  is  an  idealized 
computational  model,  and  it  is  not  intended  to  be  a 
blueprint  of  how  the  brain  encodes  an  LTKB  (long-term 
knowledge  base)  and  performs  reflexive  reasoning.”  We 
would  like  to  stress  that  shruti  is  an  idealized  computa¬ 
tional  model,  and  when  we  claim  that  it  is  biologically 
plausible  we  mean  that  it  is  possible  to  realize  its  essential 
components  -  the  behavior  of  various  types  of  nodes  and 
the  functionality  of  the  proposed  network  -  using  neural 
wetware. 

R.1.1.  Synchrony,  oscillations  and  biological  plausibility. 

One  of  the  major  issues  raised  in  the  commentary  con¬ 
cerns  the  biological  reality  of  oscillations.  With  varying 
degrees  of  emphasis  Young,  Freeman,  Tsuda,  and 
Eckhorn  point  out  that  periodic  (oscillatory)  activity  does 
not  occur  in  the  brain.  Freeman  and  Young  even  go  on  to 
suggest  that  shruti  is  therefore  not  biologically  plau.si- 
ble.  This  conclusion  rests  on  a  mistaken  understanding  of 
the  role  of  oscillations  in  shruti. 

R1.1.1.  Oscillations  are  not  essential  for  the  functioning 

of  SHRUTI.  Tlu‘  essential  feature  of  neural  activity  re(|uir»'d 


by  SHRUTI  is  synchronization  of  cell  activity  and  the 
propagation  of  synchronous  activity  along  connected  cell- 
clusters.  Since  transient  oscillatory  activity  seemed  like  a 
natural  way  of  realizing  such  a  behavior,  we  adopted 
oscillations  in  our  model,  but  oscillations  per  se  are  not 
essential  for  the  functioning  of  shruti.  It  is  therefore 
incorrect  to  link  its  biological  plausibility  with  the  exis¬ 
tence  or  nonexistence  of  oscillations  in  the  brain.  The 
crucial  question  is  this:  Is  synchronous  activity  biolog¬ 
ically  real?  The  nonessential  role  of  oscillations  in  our 
mode)  is  clearly  recognized  by  Rohwer  and  Strong  (and  to 
some  extent  by  Thorpe).^  Strong  also  points  to  some 
architectural  simplifications  that  might  result  from  drop¬ 
ping  the  periodicity  requirement  (but  see  R1.4).  Eckhorn 
and  Freeman  do  discuss  the  possibility  that  dynamic 
bindings  may  be  represented  by  aperiodic  (nonrhythmic) 
synchronous  activity  in  the  brain,  but  they  fail  to  rec¬ 
ognize  that  such  a  representation  is  compatible  with 
SHRUTI. 

The  primacy  of  synchronization  in  the  representation 
and  propagation  of  dynamic  bindings  is  pointed  out  at 
several  points  in  the  target  article,  including  the  title. 
“We  represent  dynamic  bindings  between  arguments  and 
fillers  by  the  synchronous  firing  of  appropriate  nodes” 
(sect.  3,  para.  3;  see  also  sect.  1.3,  para.  3;  and  sect.  3. 1, 
para.  2).  The  behavior  of  p-btu  nodes  in  section  3.2  and 
T-and  nodes  in  section  3.3  is  defined  in  terms  of  general 
synchronous  activity  and  then  elaborated  for  the  case  of 
oscillatory  activity.  The  output  of  T-or  nodes  has  been 
specified  as  being  oscillatory.  This  is  not  critical,  however, 
and  it  is  trivial  to  modify  the  design  so  that  the  output  of 
T-or  nodes  may  be  assumed  to  be  a  burst  of  activity  whose 
duration  is  comparable  to  Tr„„,j. 

In  the  aperiodic  case,  the  parameters  ii„i„  and 
correspond  to  the  minimum  and  maximum  allowable  time 
between  two  consecutive  firings  of  a  cell-cluster  involved 
in  synchronous  activity.  The  interpretation  of  w  continues 
to  be  the  width  of  the  window  of  synchrony  (see  sect.  3. 1, 
last  para.).  So  the  basic  architecture  of  shruti  remains 
the  same  even  if  we  admit  aperiodic  synchronous  activity. 
The  propagation  of  bindings  now  parallels  even  more 
closely  than  before  the  propagation  of  activity  along  "syn- 
fire  chains”  (Abeles  1982). 

It  is  important  to  note  that  dropping  the  requirement  of 
periodicity  does  not  change  the  predictions  about  reflex¬ 
ive  reasoning.  The  restriction  on  the  form  of  rules,  the 
bounded  depth  of  reasoning,  and  the  constraints  on  the 
capacity  of  the  WMRR  (working  memory  underlying 
reflexive  reasoning)  remain  the  same.  The  exact  numeric 
value  of  the  ratio  it,^Jw,  however,  may  have  to  be  revised 
using  the  appropriate  neurophysiological  data. 

R1.1.2.  What  led  us  to  oscillations?  When  he  asserts  that 
our  ideas  and  shruti  are  based  on  the  phenomenon  of 
oscillatory  activity  in  the  animal  brain.  Young  (para.  2)  has 
it  backwards.  On  the  contrary,  the  design  of  shruti  was 
driven  by  the  computational  constraints  on  connectionist 
inorlels  enumerated  in  section  1.2  and  the  complexity 
requirements  of  reflexive  reasoning  discussed  in  section 
1.1.  The  computational  constraints  prompted  us  to  use 
temiwiral  synchrony  as  a  basis  for  representing  dx  namic 
bindings  (to  obviate  the  need  of  propagating  |X)inters  or 
symbols).  We  chose  periodic  (oscillators)  activity  simpK' 
becau.s*' oscillations  .seemed  like  the  most  straightfonvard 
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and  natural  way  of  incoqx)rating  synchronous  activity.  As 
Thorpe  points  out,  it  was  only  later  that  we  heard  about 
the  evidence  for  oscillatory  activity  in  the  brain. 

R1.1.3.  Can  we  conclude  that  oscillations  are  ethereal? 

The  biological  reality  of  oscillations  is  a  matter  of  contro¬ 
versy.  There  are  a  growing  number  of  reports  of  oscilla¬ 
tory  activity  in  the  brain  -  these  include  findings  in  the  cat 
(see  target  article  for  references),  squirrel  monkey  (Liv¬ 
ingstone  1991),  macaque  (Engel  eta).  1992;  Kreiter  eta). 
1992),  and  even  humans  (Lado  et  al.  1992).3  At  the  same 
time,  we  have  negative  evidence  concerning  oscillatory 
activity  -  we  cite  some  papers  in  the  target  article  and 
Young  cites  some  additional  findings.  So  the  issue  is  far 
from  settled.  In  spite  of  such  conflicting  evidence. 
Young’s  emphatic  assertion  that  oscillations  do  not  occur 
in  the  primate  brain  and  are  unlikely  to  play  a  role  in 
the  representation  of  dynamic  bindings  does  not  seem 
justified. 

R1.2.  Expected  nature  of  oscillatory  activity.  Let  us  as¬ 
sume  that  oscillatory  activity  underlies  the  representation 
of  dynamic  bindings  during  reflexive  reasoning.  What 
sort  of  activity  should  we  then  expect  to  find  in  the  brain 
during  an  episode  of  reasoning?  The  answer  would  vary 
dramatically  depending  upon  our  expectations  almut  the 
nature  of  representations  used  by  the  brain.  It  is  crucial 
that  we  recognize  this,  because  not  doing  so  may  lead  to 
erroneous  expectations  aliout  the  nature  of  oscillatory 
activity  in  the  brain,  and  in  turn,  to  wrong  interpretations 
of  raw  data.  We  address  this  question  in  the  context  of 
periodic  activity  but  our  remarks  also  apply  to  aperiodic 
activity. 

If  one  believes  in  fully  distributed  representations  and 
assumes  that  entities  are  represented  as  patterns  of  activ¬ 
ity  over  large  populations  of  cells,  one  would  expect  a 
large  number  of  cells  to  participate  in  oscillatory  activity 
during  an  episode  of  reflexive  reasoning.  On  the  other 
hand,  if  one  believes  in  more  compact  representations  of 
the  type  adopted  in  SHKUTI,  one  should  only  expect  a 
relatively  small  number  of  cells  to  participate  in  oscilla¬ 
tory  activity. 

Now  consider  Freeman’s  observation  (para.  3)  that 
periodically  firing  cells  form  a  small  tail  in  a  distribution  of 
firing  rates  and  that  a  majority  of  cells  yield  a  pulse 
interval  that  is  more  Poisson  than  periodic.  What  one 
concludes  from  the  data  would  depend  on  one’s  assump¬ 
tions  about  the  nature  of  representations.  Someone  who 
believes  in  distributed  representation  would  be  com¬ 
pelled  to  conclude  that  oscillatory  activity  does  not  under¬ 
lie  the  representation  of  dynamic  bindings,  but  someone 
who  believes  in  more  compact  representations  would  find 
good  evidence  for  the  hypothesis  that  oscillatory  activity 
underlies  dynamic  bindings;  because  only  a  very  small 
fraction  of  cells  would  be  involved  in  oscillatory  activitv  at 
any  time,  the  small  tail  constitutes  just  the  right  evidence! 

R1 .2.1 .  The  nature  of  oscillations  predicted  by  shruti.  I  .et 

ns  consider  a  thought  experiment  to  illustrate  the  nature 
of  oscillatory’  activity  entail<‘d  by  a  .SUmTi-like  .system. 
Assiinu’  that  the  system  is  in  a  "(luiescent  ”  state,  namelv. 
it  is  not  receiving  any  stimulus  and  is  not  <‘ngag«-d  in  anv 
systematic  thought.  .At  this  time  tin-  nodes  in  the  .system 
\M)n)(l  be  tiring  with  some  background  rat<’,  perha|is 


Poisson.  Now  assume  that  the  dynamic  fact  "John  Imught 
a  Rt)lls  Royce  ”  is  injected  into  the  system.  We  would 
expect  two  resultant  trains  of  oscillatory  activity  to  propa¬ 
gate  in  the  system.  One  would  originate  at  the  John  and 
buyer  clusters  and  rapidly  expand  to  include  other  clus¬ 
ters  representing  owner,  person,  wealthy,  and  so  on.  A 
sec-ond  train  of  activity  would  originate  at  the  Kolls-Royc-e 
and  buy-object  clusters  and  expand  to  include  other 
clusters  such  as  car  and  own-object.  This  oscillatory  activ¬ 
ity  might  last  only  a  few  milliseconds,  after  which  the 
synchronization  would  probably  break  down.  The  active 
nodes  may,  however,  continue  to  lire  at  a  high  rate  for 
some  time  before  reverting  to  the  background  rate  of 
firing. 

The  model  {xisits  that  arguments  and  fcK'al  nodes  of 
concepts  are  encoded  by  small  clusters  of  c-ells.  Even  if 
the  reflexive  reasoning  following  an  input  results  in  the 
activation  of  several  hundred  relations  (predicates)  and 
types/features,  the  total  number  of  nodes  engaged  in 
synchronous  activity  during  an  episode  of  reasoning  may 
remain  small  -  perhaps  about  10-’ cells.  Furthermore,  this 
activity  would  be  distributed  across  the  area(s)  where 
conceptual  knowledge  is  represented.  This  estimate  is 
extremely  crude  and  speculative  and  may  be  off  by  an 
order  of  magnitude,  but  it  still  conveys  the  essential  imint: 
A  very  small  fraction  of  cells  (perhaps  as  few  as  one  in 
almut  a  hundred  thousand)  may  be  involved  in  syn¬ 
chronous  activity  during  an  episode  of  reasoning  (this 
already  assumes  that  we  are  fiKusing  on  some  appropriate 
1-10%  of  the  brain  where  we*  expect  conceptual  knowl¬ 
edge  to  be  represented). 

R1.2.2.  A  fully  developed  sHRuri-like  system  will  have 
complex  dynamics.  Consider  a  fully  developed  suKUTi- 
like  system  incorporating  the  functionality  outlined  in 
sections  10.1-10.4.  The  extended  system  would  be  capa¬ 
ble  of  responding  to  continuous  stimuli  and  of  shifting  its 
focus  of  attention.  The  dynamics  of  such  a  system  would 
be  far  more  complex  than  the  simple  oscillatory  patterns 
depicted  in  the  examples  shown  in  the  target  article.  The 
frequency  of  its  oscillations  would  vary  constantly  be¬ 
cause  frequency  increa.ses  whenever  entities  “leave"  the 
WMRR  and  decreases  whenever  other  entities  “enter.’ 
Different  modules  in  the  system  would  be  firing  at  differ¬ 
ent  frecjuencie.s  and  each  would  ha\'e  its  own  phase  distri¬ 
bution.  It  is  not  at  all  surprising  that  the  oscillators- 
activity  observed  in  the  brain  is  far  more  complex  than  the 
activity  portrayed  in  the  target  article!  We  had  noted  this 
in  .section  7.1. 

R1.2.3.  Chaos  and  oscillations.  An  alternative,  much 
more  complex  description  of  neural  actis  ity  based  on  the 
notion  of  chaotic  oscillations  is  offered  by  Tsuda.  The 
relation  between  his  characterization  and  SHKuri  needs 
to  be  examined  further,  but  it  appears  that  Tsuda  may  be 
underestimating  the  degr<*e  of  s\  stemalicit)-  refjuired  for 
supporting  reasoning  ol  the  sort  we  discuss  in  the  target 
article.  The  dynamics  he*  describes  seem  more  apt  lor  a 
.system  that  is  not  engaged  in  systematic  reasoning  ri'sulf- 
in.g  from  a  specific-  stimulus,  and  the  activits  ol  a  SHIU'I  i- 
like  svstem  in  a  disengaged  state  could  well  be  ehaotic, 
but  we  think  the  aeti\it\  of  the  appro|)riate  subset  ol 
node's  wotild  base  to  gc-l  orgarii/ed  rapidb  once  the 
svstc-m  engages  in  scsteuialie  re.isoniug. 
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R1.3.  Dynamic  bindings  and  neural  communication.  Sec¬ 
tion  3  of  Eckhom's  commentary  suggests  that  he  may  not 
realize  how  much  information  must  be  transmitted  to 
communicate  dynamic  bindings  during  reflexive  reason¬ 
ing.  In  discussing  the  limited  ability  of  a  neuron  to 
transmit  symbolic  information  we  had  estimated  the 
amount  of  information  transmitted  in  15  msec  to  be  about 
2  bits  (see  Note  4).  This  was  based  on  the  assumption  that 
the  firing  rate  typically  varies  between  1-200  spikes/sec. 
Eckhorn,  however,  argues  that  the  maximum  rate  of  firing 
can  be  as  hipjh  as  300  bit/sec,  and  that  if  we  assume  a  20 
msec  cycle  time,  the  amount  of  information  transmitted 
by  a  neuron  can  be  as  high  as  8  bits.  Unfortunately,  this 
does  not  change  the  situation  one  bit!  Neither  2  nor  8  bits 
are  sufficient  for  solving  the  dynamic  binding  problem 
during  reflexive  reasoning.  The  number  of  bits  a  neuron 
would  need  to  transmit  to  communicate  the  identity  of  an 
argument  filler  will  be  more  than  20.  Contrary  to  what 
Eckhorn  seems  to  imply,  the  number  of  distinct  entities 
that  may  fill  arguments  in  dynamic  bindings  is  not  500  but 
closer  to  100,000.'*  This  means  that  even  if  we  were  to 
assume  perfect  coding  and  noiseless  communication  we 
would  require  20  bits  to  communicate  the  identity  of  each 
filler. 

Eckhorn  also  suggests  that  the  neuronal  limitations  in 
communicating  symbolic  information  may  be  overcome 
by  using  clusters  of  neurons  rather  than  single  ones.  In 
section  9.4  of  the  target  article,  we  discuss  such  a  possi¬ 
bility  and  point  out  the  advantages  of  using  the  temporal 
synchrony  approach. 

In  paragraph  5  of  his  commentary,  Thorpe  suggests 
there  is  a  tension  between  the  fact  that  shkuti  takes 
several  cycles  of  synchronous  activity  to  compute  a  re¬ 
sponse  and  other  evidence  suggesting  that  neurons  re¬ 
spond  within  just  one  or  two  spikes  (cycles).  He  seems  to 
be  overlooking  the  fact  that  an  episode  of  reasoning  takes 
several  cycles  of  synchronous  activity  because  it  involves 
the  propagation  of  synchronous  activity  over  several 
layers  of  cells  -  as  many  layers  as  the  length  of  the  chain  of 
reasoning.  The  propagation  of  activity  across  each  layer, 
however,  only  takes  1-2  cycles  (spikes).  This  is  exactly 
what  one  would  expect  in  view  of  Thorpe’s  discussion  in 
his  commentary  (paras.  3,  4). 

R1.4.  Complexity  of  node  types  and  circuits.  Several 
commentators  (Dawson  &  Berkeley,  Diederich,  and 
Carson)  suggest  that  the  node  types  used  in  SHKUTi  are 
not  biologically  plausible.  The  behavior  of  p-btu  nodes  is 
eminently  plausible  and  if  Abeles  (1982)  is  right  about  the 
significance  of  synchronous  activity  and  synfire  chains,  it 
can  even  be  argued  that  a  p-btu  node  with  an  appro¬ 
priately  higli  threshold  is  a  reasonable  idealization  of  a 
neuron.  The  other  two  types  of  nodes,  namely,  the  T-and 
and  T-or  nodes,  are  best  viewed  functionally  as  simple 
circuits  made  up  of  a  small  number  of  cells. 

Cottrell,  Dawson  &  Berkeley,  Diederich,  Carson, 
Koerner,  and  Touretzky  &  Fahiman  remark  that  .st)ine  of 
the  circuits  used  iti  SHRirn,  particularly  the  multiple 
instantiation  switch  networks,  are  too  comph’x  and  spe¬ 
cific  to  be  biologically  plausible.  First,  the  .switches  de- 
.scribed  in  the  target  article  are  intended  to  rleinonstrate 
that  it  is  possible  to  achieve  tlu'  desired  control  ami 
fnnctionalits  by  t  irenits  made  up  ofp-btn,  T-and,  and  T-or 
iKuh’S.  Thcs<-  circuits  ampb’  demonstrat<'  this  possibility. 


Second,  we  agree  that  these  circuits  are  quite  specific  and 
complex.  To  put  things  in  perspective,  however,  we  would 
like  to  point  out  that  the  concern  about  the  circuits’  being 
too  specific  and  complex  to  be  biologically  plausible  is 
misplaced  and  stems  in  part  from  the  tacit  assumption 
that  these  circuits  have  to  be  learned  by  an  agent.  There  is 
no  reason,  however,  to  assume  that  such  circuits  -  or, 
rather,  circuits  that  are  functionally  equivalent  to  these  - 
are  learned  developmentally.  It  is  enough  to  assume  that 
they  have  been  “designed  ”  by  a  proc-ess  that  operates  at 
the  evolutionary  scale.  Surely  evolutionary  proc-esses  are 
capable  of  crafting  something  as  simple  as  the  circuit  in 
Figure  22.  To  think  otherwise  amounts  to  ignoring  the 
intricacy,  specificity,  and  complexity  of  the  brain,  not  to 
mention  the  human  Ixxly.  Note  that  the  internal  circuitry 
of  each  switch  is  the  same,  so  the  same  circuitry  can  In? 
replicated  over  and  over  again.  Learning  need  only  in¬ 
volve  connecting  the  input  and  output  “wires  ’  of  preexist¬ 
ing  switches  to  the  input  and  output  wires  of  concept 
clusters. 

On  a  different  note.  Strong  suggests  that  the  design  of  a 
concept  cluster  may  be  simplified  if  the  {jeriodicity  re¬ 
quirement  is  relaxed.  This  proposal  is  interesting  because 
it  allows  the  potential  of  sharing  nodes  and  seems  to  be 
capable  of  self-induced  phase  separation.  Although  we 
can  see  how  the  proposed  alternative  allows  multiple 
instances  of  a  concept  to  be  represented,  it  is  not  clear 
how  it  solves  the  difficult  technical  problem  of  communi¬ 
cation  between  two  concept  clusters.  It  would  be  instruc¬ 
tive  to  generalize  the  proposal  to  encode  n-ary  predicates 
so  that  several  predicate  instantiations  may  be  repre¬ 
sented  without  cross-talk.  It  would  also  be  interesting  to 
see  how  the  arguments  of  antecedent  and  consequent 
predicates  can  be  linked  to  ensure  that  bindings  pertain¬ 
ing  to  several  instantiations  may  propagate  without  cross¬ 
talk. 

R1.5.  Timing  estimates.  It  is  argued  b\'  Carson  and  by 
Hirst  &  Wu  that  the  nodes  in  SHRUri  do  not  correspond  to 
actual  neurons  and  that  it  is  therefore  inappropriate  to 
conclude  anything  about  the  actual  time  course  of  reflex¬ 
ive  processing  based  on  an  analysis  of  our  model.  The 
timing  data  we  present  in  section  8.1.1  are  meant  to  be  a 
broad  indicator  of  reasoning  times  and  their  main  purpose 
is  to  demonstrate  that  reflexive  reasoning  can  be  per¬ 
formed  within  a  few  hundred  milliseconds  by  a  system  of 
simple  and  slow  computing  elements.  Note  that  the  basis 
of  our  estimates  is  the  time  it  takes  synchronous  activity  to 
propagate  from  one  cluster  of  cells  to  another.  Our  esti¬ 
mates  are  therefore  not  too  sensitive  to  actual  encoding 
details  as  long  as  the  number  of  layers  in  an  alternative 
implementation  of  the  switches  and  clusters  is  compara¬ 
ble  to  the  number  of  layers  in  our  implementation. 

R1 .6.  Restriction  on  the  number  of  entities.  Our  estimates 
of  the  maximum  number  of  different  entities  that  can  be 
referenced  in  the  VVMRK  are  challenged  by  Koerner.  He 
argues  that  we  should  not  u.se  data  about  the  periodicity  of 
oscillations  of  cells  involved  in  early  visual  processing  to 
make  inferences  about  comph'x  jisychological  puKcsses, 
citing  his  own  inti-rpretation  ol  the  7  ±  2  limit  ba.sed  on 
the  much  slower  0  activ  ity. 

Our  estimates  apply  to  first-order  bindings,  those  lor 
which  argument  fillers  are  entiti<'s  and  not  cKnainii 
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relational  structures  (an  entity  may  l)e  a  complex  rela¬ 
tional  structure  as  long  as  it  is  a  static  one).  We  believe  that 
reflexive  reasoning  primarily  involves  first-order  bind¬ 
ings;  for  such  bindings,  the  appropriate  interpretation  of 
•n  is  the  cycle  time  of  relatively  small  cell-clusters  partici¬ 
pating  in  synchronous  activity  (periodic  or  aperiodic). 
This  interpretation  of  -ir  would,  we  believe,  apply  to  first- 
order  bindings  involved  in  reflexive  reasoning  as  well  as 
vision,  modulo  variations  in  the  characteristics  of  differ¬ 
ent  cell  types. 

A  possible  reason  for  Koerner’s  objection  may  be  that 
he  is  referring  to  higher-order  relational  structures  such 
as  patterns  of  activity  corresponding  to  complete  solutions 
and  hypotheses,  in  which  fillers  themselves  can  be  dy¬ 
namic  relational  structures.  If,  as  Koerner  suggests,  tem¬ 
poral  synchrony  is  used  to  prevent  cross-talk  among  such 
complex  and  large  dynamic  structures,  the  associated 
temporal  activity  is  likely  to  have  a  much  larger  cycle  time 
and  hence  a  much  higher  value  of  tt. 

R1 .7.  “Local”  representations  and  biological  plausibility. 

It  is  suggested  by  Touretzky  &  Fahiman  (paras.  5,  6)  and 
by  Hirst  &  Wu  that  SHRUTI  is  not  biologically  plausible 
because  it  uses  localist  representations!  The  representa¬ 
tion  of  a  predicate  in  SHRUn  is  not  a  single  node;  a  cluster 
is  n  +  2  nodes  (n  role  nodes  and  2  T-and  nodes).  As 
Hummel  &  Holyoak  point  out,  this  means  that  our  repre¬ 
sentation  of  a  predicate  instance  is  a  distributed  pattern 
(though  not  in  a  holographic  sense).  In  addition,  each 
argument  node  maps  to  several  cells  that  may  be  physi¬ 
cally  distributed.  So  our  model  makes  use  of  a  physically 
“distributed”  representation  even  though  the  representa¬ 
tion  of  an  argument  is  conceptually  “localist.”  In  view  of 
the  above,  it  is  not  clear  which  biological  axiom  Touretzky 
&  Fahiman  and  Hirst  &  Wu  think  we  are  violating.  If  they 
are  suggesting  that  our  model  does  not  adhere  to  the 
holographic  version  of  distributed  representation  then  we 
refer  them  to  Hummel  &  Holyoak’s  commentary  and 
section  R3. 1,  where  it  is  argued  that  such  representations 
cannot  support  systematicity  and  knowledge-level 
parallelism. 

R1 .8.  Are  brain  mechanisms  totaliy  distinct  across  modal¬ 
ities?  “Whether  or  not  the  brain  makes  use  of  temporal 
synchrony  in  object  perception  has  no  bearing  on  how  we 
reason  abstractly,”  writes  Sloman  (para.  2,  emphasis 
added).  Diederich  (para.  8)  expresses  a  similar  concern 
(though  in  a  milder  form).  We  think  Sloman  s  stand  is  an 
extreme  one.  There  are  good  reasons  to  suspect  that  the 
mechanisms  developed  for  perceptual  and  motor  process¬ 
ing  were  coopted  by  the  brain  to  solve  other  cognitive 
problems. 

R1.9.  Central  control  and  shruti.  Doubts  are  expressed  by 
Koemer  about  our  view  that  the  reflexive-reasoning  pro¬ 
cess  can  run  without  a  central  controller.  He  suggests  that 
central  control  will  be  rerpiired  during  reasoning  .mtl 
decision  making.  He  envisions  such  a  controller  guiding 
the  activity  into  a  “globally  consistent”  state  and  “fwiis- 
ing”  the  “search”  toward  additional  support  for  the  ehosen 
hypothesis.  We  welcome  his  comments  and  pointers  to 
his  work  on  related  problems,  but  we  think  they  pertain 
more  to  reflective,  than  reflexise,  processing.  Notk'c  that 
what  he  describes  .seems  like  deliberative  reasoning  and 


decision  making  where  the  system  must  choose  one  of 
several  com|}eting  hyjwtheses,  find  a  globally  consistent 
state,  and  carry  out  fiwused  search  for  support.  This  is 
exactly  the  sort  of  processing  we  have  described  as  reflec¬ 
tive  (see  sect.  1).  It  is  quite  likely  that  such  reflective 
reasoning  requires  highly  controlled  and  focused  activity 
driven  by  an  attentional  mechanism  (see  sect.  8.2.2).^ 

Note  also  that  the  view  of  representation  and  process¬ 
ing  put  forth  by  Koemer  relies  much  more  on  finely 
structured  temporal  activity  than  on  the  kind  envisioned 
in  SHRUTI  (even  in  its  extended  form).  One  of  the  features 
of  SHRUTI  is  that  it  uses  a  great  deal  of  spatial  structure  to 
reduce  its  dependence  on  the  fine-grained  structure  of 
temporal  activity.  It  is  possible  that  a  greater  dependence 
on  fine-grained  temporal  structure  hats  led  Koerner  to  the 
conclusion  that  central  control  is  necessary  for  controlling 
the  tem|x>ra)  aspects  of  activity. 

Dawson  &  Berkeley  (para.  2)  and  Touretzky  &  Fahi¬ 
man  (paras.  8-10)  confuse  an  “interface”  with  a  “central 
controller,”  accordingly  arguing  that  our  system  already 
has  a  (hidden)  controller!  We  respond  to  this  in  section 
R5.3. 

R1.10.  Some  imaginative  arguments  against  the  bioiog- 
icai  plausibility  of  shruti.  Our  system  is  not  biologically 
plausible  Dawson  &  Berkeley  (para.  5)  claim,  liecause  it 
rerjuires  an  “external  learn  signal”  to  trigger  the  learning 
of  a  fact.  Their  observation  is  based  on  a  statement  in 
section  10.5  about  a  scheme  for  storing  facts  in  medium- 
term  memory.  We  indicated  there  that  a  fact  is  learned  in 
the  presence  of  a  “learn  signal.”  Surely  it  is  plausible  that 
an  internally  generated  signal  based  on  the  novelty  or 
salience  of  an  input  allows  the  one-shot  learning  of  a 
situation  (fact).  Next,  Dawson  &  Berkeley  invoke  the 
biological  implausibility  of  backpropagation  to  claim  that 
our  system  is  not  biologically  plausible.  The  most  imag¬ 
inative  objection  to  the  biological  plausibility  of  our  sys¬ 
tem,  however,  is  that  the  limitation  of  the  (logical)  infer¬ 
ential  power  of  our  system  “casts  further  doubt  on  its 
putative  biological  plausibility”!  To  suggest  that  shruti  is 
not  biologically  plausible  because  it  lacks  inferential 
power  seems  to  miss  the  point  altogether. 

R2.  Cognitive  significance 

The  comments  about  the  cognitive  significance  of  shruti 
cover  a  wide  range.  On  the  one  hand,  we  have  Touretzky 
&  FahIman’s  sweeping  dismissal  of  our  model;  on  the 
other  hand,  we  have  Oaksford  &  Malloch’s  enthusiastic 
appraisal  of  it  as  “potential  landmark  in  the  cognitive 
scicnce/psychology  of  human  reasoning”!  We  think  an 
objective  and  careful  evaluation  of  the  target  article  will 
help  the  reader  determine  which  of  these  is  the  better 
characterizjrtion  of  shruti. 

The  criticism  of  shruti  centers  around  two  themes: 
The  first  concerns  the  lack  of  empirical  support.  The 
setxmd  concerns  incomplete  coverage  of  the  reflexive¬ 
reasoning  phenomenon. 

R2.1.  Empirical  support  and  cognitive  modeling.  \\e 

agree  th;it.  contrarv  to  standard  practice  in  psychologv. 
we  have  not  triinl  to  ri'plicate  a  specific  data-set  obtained 
in  a  laboratory  <‘xp<*rimcnt.  W<-  ha\<-  pursm'd  a  very 
different  approach,  focusing  on  a  set  of  sjiaei-  and  time 
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constraints  obtained  from  some  broad  but  well-grounded 
observations  about  the  nature  of  reflexive  reasoning  (see 
sect.  1.1.1;  see  also  Hampson,  Oaksford  &  Malloch,  and 
Obisson).  At  the  same  time,  we  have  also  focused  on  a  set 
of  computational  constraints  that  characterize  the  archi¬ 
tecture  underlying  cognition  -  in  particular,  its  limited 
ability  to  process  and  communicate  symbolic  information 
(sect.  1.2).  What  is  described  in  the  target  article  is  (1)  a 
detailed  model  that  satislies  these  two  sets  ot  constraints 
and  (2)  the  psychological  implications  of  the  model. 

In  suggesting  that  our  model  is  rich  in  assumptions 
Sloman  may  be  missing  the  significance  of  these  txnnputa- 
tiona*  and  architectural  constraints  and  labeling  them  as 
mere  assumptions.  We  sympathize  with  him  for  this 
confusion.  As  Oaksford  &  Malloch  point  out  (para.  3),  our 
approach  is  not  very  common  in  cognitive  psychology,** 
where  issues  of  computational  effectiveness  and  sca¬ 
lability  are  often  secondary  and  the  emphasis  is  on  build¬ 
ing  empirically  adequate  models  that  fit  some  well- 
circumscribed  body  of  data. 

R2.1.1.  SHRun  and  predictions.  In  addition  to  satisfying 
the  general  set  of  constraints  on  space-time  resources  and 
information  processing  abilities  of  nodes  and  links, 
SHRUTI  also  leads  to  several  specific  and  testable  predic¬ 
tions  about  the  nature  of  reflexive  reasoning.  A  number  of 
psychologists  have  remarked  on  the  significance  of  these 
predictions  (see  Diederich,  Hampson,  Hummel  &  Holy- 
oak,  Oaksford  &  Malloch,  and  Ohlsson).  As  described  in 
section  8,  shruti  predicts  the  capacity  of  the  working 
memory  underlying  reflexive  reasoning  (WMRR)  and  the 
form  of  rules  that  can  participate  in  such  reasoning. 
SHRUTI  also  predicts  that  the  maximum  depth  of  deriva¬ 
tions  during  systematic  reasoning  will  be  shallower  than 
that  of  associative  priming  (sect.  8.2.6). 

Oddly  enough,  Sloman  n  id  Touretzky  &  Fahlman 
discount  all  these  predictions.  Inexplicably,  Sloman  dis¬ 
misses  the  predictions  concerning  the  capacity  of  WMRR 
in  suggesting  that  Baddeley  s  work  “already  accounts  for 
working  memory  data”!  Yet  the  target  article  explicitly 
states  that  the  WMRR  is  the  functional  description  of  the 
dynamic  activity  of  the  LTKB  and  is  quite  distinct  from 
the  notion  of  working  memory  studied  by  Baddeley  (see 
sect.  8.2.2).  We  would  like  to  stress  that  our  predictions 
about  the  capacity  of  WMRR  are  not  only  potentially 
important  for  reflexive  reasoning,  but  they  may  also  lead 
to  insights  into  other  reflexive  phenomena  as  well  (e.g., 
see  Henderson’s  [1992]  work  on  parsing). 

Sloman  also  describes  the  restrictions  on  the  form  of 
rules  identified  in  section  8.2.5  as  “rather  arbitrary”  and 
“unlikely  to  be  useful.”  He  overlooks  the  discussion  in 
section  4.9,  where  we  suggested  why  the  constraint  may 
have  a  fundamental  computational  basis.  Since  the  writ¬ 
ing  of  the  target  article  we  have  a  proof  that  the  constraint 
on  the  form  of  rules  is  an  essential  one  for  reflexive 
processing  (Dietz  et  al.  1993).  In  other  words,  reasoning 
involving  rules  that  violate  the  constraint  in  question 
cannot  be  carried  out  using  space  that  is  only  linear  in  the 
size  of  LTKB  and  time  that  is  independent  of  the  size  of 
LTKB. 

Sloman  goes  on  to  argue  that  our  predictions  about  the 
limitations  on  the  use  of  transitivity  are  also  wrong.  His 
counterexample  seems  to  he  based  on  a  misunderstand¬ 
ing  of  the  issue  at  hand.  It  does  not  establish  that  peoph' 


can  compute  transitivity  with  ease;  it  simply  demon¬ 
strates  the  rather  obvious  fact  that  people  are  good  at 
identifying  certain  linear  orderings  -  in  this  case  the  natu¬ 
ral  ordering  of  integers!  Note  that  an  agent  can  determine 
whether  PimplieriiJ)  simply  by  noting  whether  or  not  i  is 
less  thanjf 

Surely  Sloman  must  realize  that  his  example  does  not 
provide  an  appropriate  test  of  the  prediction.  Consider 
the  following  “experimental  data”:  After  being  shown  the 

sequence  FoO],  F002 . F0037,  subjects  were  able  to 

recall  correctly  all  the  37  items  in  the  sequence,  further¬ 
more,  they  were  also  able  to  recall  the  exact  order  in 
which  the  items  were  presented.  Would  Sloman  conclude 
from  this  that  our  memory  span  is  37  and  that  there  is  no 
recency  effect? 

As  for  explaining  human  fallacies,  the  model  makes  a 
number  of  predictions  about  when  people  might  provide 
no  answer  or  a  wrong  one.  The  constraints  on  the  form  of 
rules,  the  capacity  of  WMRR,  and  the  depth  of  reasoning 
all  point  to  the  numerous  ways  in  which  human  reflexive 
reasoning  is  fallible.  We  are  also  aware  that  we  have  not 
modeled  all  sources  of  error  and  all  factors  that  lead  to 
nonprescriptive  behavior  (e.g.,  see  sect.  5,  para.  2).  The 
phenomenon  that  Sloman  refers  to,  namely,  the  graded 
nature  of  category  inclusion,  is  the  type  of  phenomenon 
that  is  relatively  easy  to  model  in  a  connectionist  network 
using  weighted  links,  so  we  are  not  surprised  that  Sloman 
has  a  “simple  model”  that  mimics  this  specific  effect. 

Touretzky  &  Fahlman  set  up  a  false  contrast  when  they 
cite  the  work  of  Collins  and  Michalski  (1989)  in  evaluating 
the  significance  of  our  work.  The  two  efforts  are  motivated 
by  very  different  concerns  and  goals.  Collins  and  Mi- 
chalski’s  work  is  clearly  significant,  but  it  does  not  address 
the  problem  of  reflexive  reasoning,  computational  effec¬ 
tiveness,  or  biological  plausibility. 

We  do  agree  with  Sloman  that  it  would  be  nice  to  see 
how  our  model  could  comprehend  a  simple  story.  We  also 
agree  with  Ohlsson  that  the  results  of  the  model  need  to 
be  integrated  with  existing  psychological  theories  and 
with  Martin  that  it  is  important  to  identify  the  relevant 
empirical  data  that  would  serve  to  C'orroborate  the  reflex¬ 
ive/reflective  distinction  suggested  by  shruti.  Oaksford 
&  Malloch  (para.  7)  point  to  experimental  results  that 
provide  corroborative  data  about  the  reflexive/reflective 
distinction.  We  hope  other  cognitive  psychologists  will 
also  contribute  in  this  regard.  In  this  context  we  would 
like  to  add  that  the  validation  of  the  constraints  proposed 
by  the  model  need  not  come  from  the  area  of  reasoning 
alone.  They  may  also  be  validated  by  examining  their 
implications  for  parsing,  another  reflexive  phenomenon, 
and  one  for  which  there  exist  extensive  empirical  data. 
Henderson’s  (1992)  work  on  parsing,  using  a  SHRUTi-like 
model,  is  beginning  to  show  that  such  restrictions  help 
explain  some  of  the  limitations  of  human  parsing  by 
modeling  certain  garden  path  phenomena  and  people’s 
limited  ability  to  deal  with  center-embedding. 

R2.2.  Questions  about  coverage:  Red  herrings  and  real 
issues.  Several  commentaries  raise  the  issue  of  coverage, 
pointing  out  that  shruti  does  not  model  every  type  of 
reflexive-reasoning  behax’ior.  These  include  several  in¬ 
sightful  remarks  by  Barnden  and  Hummel  &  Holyoak 
and  valid  observations  about  tbe  lack  of  a  treatment  of 
lu'gatioii  by  Cottrell  and  Carson.  .Mun.sat  and  Bauer 


BFHAVIORAL  and  brain  sciences  (19931  16  3 


481 


/itfjf/WHA't'/Sliastri  &  Ajjanaga<ldt“:  Asswiution  to  reasoninj; 


seem  to  have  arrived  at  a  genuine  iniscxmception  about 
the  reasoning  ability  of  a  SHRirri-like  system,  whereas 
Touretzky  &  Fahiman  first  caricature  our  model  and  then 
dismiss  it  as  uninteresting! 

R2.2.1.  Logic,  deduction  and  shruti.  According  to  Tou¬ 
retzky  &  Fahiman,  our  model  is  simply  a  limited  theorem 
prover.  They  dismiss  our  work  based  on  this  claim, 
arguing  that  because  human  reasoning  is  not  purely 
deductive,  shruti  cannot  be  a  credible  model  of  it. 
Sound  argument,  but  the  premise  is  unfortunately  false. 

Hirst  &  Wu  state  that  not  all  reasoning  is  deductive, 
enumerating  five  problems  they  (rightly)  claim  require 
nondeductive  reasoning.  We  agree  (see  Ajjanagadde 
1991;  Shastri  1988a;  1988b). 

Let  us  reiterate  the  basic  technical  problem  we  have 
solved:  We  have  extended  the  representational  adequacy 
and  inferential  power  of  neurally  plausible  models  by 
demonstrating  how  connectionist  networks  can  (1)  repre¬ 
sent  relational  structures  in  a  dynamic  manner  and  (2) 
propagate  such  structures  efficiently  and  systematically  in 
accordance  with  “rules.”  Note  that  the  essential  charac¬ 
teristic  of  a  rule  is  that  it  specifies  a  systematic  mapping 
between  the  roles  of  relational  structures.  As  pointed  out 
in  the  target  article  (1)  these  relational  structures  can  be 
viewed  as  schemas  or  frames,  hence  rules  may  be  thought 
of  as  mappings  between  schemas  or  frames,  and  (2)  the 
rules  (mappings)  can  be  “deductive”  or  “evidential,”  in 
particular,  they  may  be  sensitive  to  the  type/features  of 
the  role  fillers  in  a  given  situation. 

The  representational  significance  of  the  mechanisms 
developed  in  shruti  extends  beyond  deductive  reason¬ 
ing;  the  ability  to  represent  and  systematically  propagate 
relational  structures  dynamically  lies  at  the  core  of  not 
just  deduction  but  also  evidential,  abductive,  and  analogi¬ 
cal  reasoning.  We  discussed  this  in  section  3.5  and  at 
several  places  in  the  target  article,  pointing  out  the 
broader  representational  significance  of  being  able  to  deal 
with  predicates,  variables,  rules,  and  dynamic  bindings 
(see,  e.g.,  sect.  1.3,  last  para.;  sect.  2.5;  sect.  10,  para.  1). 

In  section  2  we  made  what  we  believe  is  an  important 
distinction  between  systematicity  and  appropriateness 
that  helps  distinguish  the  problem  of  representing  and 
propagating  relational  structures  from  the  issue  of  the 
strength  of  such  a  propagation.  In  section  5  we  showed 
how  these  two  factors  could  be  integrated  by  making  the 
propagation  of  bindings  from  one  structure  to  another 
sensitive  to  the  types/features  of  the  fillers  in  the  source 
structure.  The  idea  that  the  strength  of  a  rule  firing  can  be 
defined  as  a  function  of  the  types  of  argument  fillers  is  an 
important  one  and  does  more  than  “lend  a  little  more 
(flexibility  to]  tiwdus  ptmens." 

It  is  true  that  we  carried  out  a  detailed  treatment  of 
deductive  reasoning  only  in  the  target  article.  We  did  this 
to  investigate  fully  the  strengths  and  weaknesses  of  the 
temporal  synchrony  approach  as  a  mechanism  for  repre¬ 
senting  and  systematically  propagating  relational  struc¬ 
tures.  Note,  however,  that  the  predictions  about  WMHH 
capacity  and  the  restriction  on  the  form  of  rules  woukl 
a|)ply  not  onl  to  deductive  reasoning  but  also  to  eviden¬ 
tial,  abductive,  and  analogical  reasoning.  This  should 
further  com  ince  the  reader  ol  the  methodological  signifi¬ 
cance  of  s<-para(ing  (he  issues  of  systematic'itv'  and  ap[5ro- 
prialencs  I'niikc  lonrot/,k>  N  Fahl.nan  and  Hirst  & 
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Wu,  several  other  commentators  (Barnden,  Feldman, 
Hampson,  Martin,  Oaksford  &  Malloch,  Ohisson,  Palm, 
and  Strong)  apparently  had  no  difficulty  seeing  the 
broader  significance  of  our  model. 

R2.2.2.  Reflexive  reasoning  is  limited  reasoning.  Not  only 
do  Dawson  &  Berkeley  seem  not  to  have  understood  what 
Ol..  work  is  about  (they  think  that  it  is  only  relevant  for 
deduction),  but  they  also  criticize  it  for  not  being  as 
powerful  as  a  full-blown  theorem  prover.  Restating  the 
constraints  we  identified  on  the  capacity  of  WMRR  and 
the  form  of  rules,  they  write  that  “although  these  two 
limitations  are  acknowledged,  ”  we  have  failed  to  note  “thi‘ 
full  extent  of  the  problems  they  produce.  ”  Dawson  & 
Berkeley  don’t  realize  that  we  consider  the  identification 
of  these  “limitations”  a  major  contribution  of  our  work  in 
that  it  helps  delineate  reflexive  reasoning  from  reflective 
reasoning. 

R2.3.  Plausible  and  possible  inference.  Wc  agree  with 
Bauer  that  a  reflexive  reasoning  system  should  be  able  to 
separate  plausible  inferences  from  the  vast  .set  of  possible 
inferences  and  that  a  “model  ”  should  be  capable  of  repre¬ 
senting  implicit  information  that  can  be  mad  ■  explicit  if 
and  when  the  need  arises.  Bauer  seems  to  have  the 
incorrect  belief,  however,  that  shruti  lacks  these  attri¬ 
butes.  This  may  stem  from  wrongly  assuming  that  (I) 
SHRUTI  only  encodes  plausible  inferences  in  the  LTKB, 
and  (2)  everything  shruti  infers  has  to  he  explicitly 
represented  in  it 

Possible  inferences.  Consider  the  case  of  forward  (pre¬ 
dictive)  reasoning.  Given  an  input,  the  set  of  inferences 
that  SHRuri  can  draw  using  its  LTKB  corresponds  to  the 
set  of  possible  inferences.  In  the  purely  deductive  case, 
this  set  consists  of  the  inferences  that  can  be  derived  by 
the  repeated  application  of  modus  ponens  to  the  rules  in 
the  LTKB  plus  the  input,  without  exceeding  the  capacity 
limitations  of  WMRR  and  the  bound  on  the  length  of 
individual  derivations."  Notice  that  there  is  a  clear  dis¬ 
tinction  between  (1)  the  set  of  all  possible  deductions  that 
shruti  can  compute  from  its  LTKB  plus  the  input,  and  (2) 
the  set  of  all  logically  possible  deductions  that  follow  from 
the  same  LTKB  plus  the  input.  The  former  excludes  a 
large  number  of  valid  deductions  who.se  derivations 
would  cause  WMRR  capacitv'  to  be  exceeded  or  whose 
length  would  exceed  the  depth  bound.  Hence  SHRUTI 
provides  a  natural  explanation  for  why  a  large  class  of  valid 
deductions  cannot  be  made. 

Plausible  inferences.  The  possible  inferences  drawn  bv' 
shruti  will  soon  decay  because  of  a  dispersion  of  syn¬ 
chronous  activity  unless  they  are  reinforced  by  subse¬ 
quent  inferences  (see  seet,  8.5).  Note  that  inferences  that 
reinforce  each  other  are  the  ones  that  produce  the  same 
dynamie  bindings.  This  means  that  in  a  system  based  on 
temporal  synchrony,  inferences  that  reinforce  one  an¬ 
other  produce  coherent  activitv  (literally)  and  therefore 
survive  long  enough  to  affect  other  processing  or  get 
stored  in  medium-term  memorx.  rims  (ilausible  infer¬ 
ences  correspond  to  inferences  that  are  reinforced  1)\ 
other  inferences  or  inputs  and  therefon'  snr\  i\<'.  Im|ilan- 
sible  inferences  are  the  ones  that  stand  alone,  and  hence, 
soon  deca>.  SHIUTI  iirediets  that  after  each  input,  all 
possible  inferenci’s  get  dra\'  n  l.nt  onh  the  plausible  ones 
snr\  i\  (' 
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Consider  an  extended  system  encoding  evidential  rules 
and  the  ability  to  combine  forward  and  backward  reason¬ 
ing,  in  other  words,  a  system  capable  of  abductive  reason¬ 
ing.  Now  consider  the  input  “John  bought  a  bool?’  fol¬ 
lowed  by  “It  is  Susan’s  birthday.  ”  The  input  “John  bought  a 
book”  will  lead  to  a  number  of  possible  inferences.  These 
might  include,  among  other  things:  “John  wants  to  read 
the  book  ”  and  “John  wants  to  give  the  book  to  someone.” 
The  input  “It  is  Susan’s  birthday”  will  likewise  lead  to 
number  of  possible  inferences.  Some  of  these  will  rein¬ 
force  the  inferences  from  the  previous  input.  In  particu¬ 
lar,  some  of  the  inferences  triggered  by  “It  is  Susan’s 
birthday”  will  reinforce  the  prior  inference  “John  wants  to 
give  the  book  to  someone  ”  and  also  provide  the  binding 
“Susan”  for  the  recipient  role.  Thus,  the  inference  “John 
wanted  to  give  the  book  to  Susan”  may  emerge  as  the 
coherent  (plausible)  inference  and  survive,  whereas  the 
inferences  “John  wants  to  read  the  book”  may  decay. 

Another  apparent  misconception  of  Bauer’s  is  that 
SHRUTI  can  only  infer  something  that  is  represented 
explicitly  in  LTKB!  This  is  not  the  case.  SHRUTI  is  capable 
of  performing  inference,  which  means  that  it  can  make 
explicit  things  that  are  only  implicit  in  the  LTKB  or  the 
input.  For  example,  if  the  system  is  told  “Harry  bought  a 
Rolls-Royce”  it  infers  “Harry  owns  a  car”  and  thereby 
makes  explicit  something  that  was  only  implicit  in  the 
input. 

The  example  Bauer  gives  (about  driving  to  the  store) 
can  easily  be  accounted  for  in  the  system  described  in  the 
target  article.  Bauer’s  example  simply  illustrates  that  one 
of  the  “rules”  in  the  LTKB  should  embody  the  following 
commonsense  knowledge: 

If  an  agent  goes  from  a  source  s  to  a  destination  d  during  a  time 
interval  (ti,t2],  then  for  any  location  /  on  the  path  from  s  to  d 
there  exists  a  time  t  in  the  interval  [t,,f2]  such  that  the  agent  will 
be  at  /  at  time  f. 

A  little  pause  will  convince  the  reader  that  the  knowl¬ 
edge  expressed  above  would  be  part  of  our  common 
sense.  The  LTKB  can  also  be  assumed  to  include  other 
pieces  of  common  sense  such  as  “driving  from  a  to  h 
implies  going  from  a  to  b"  and  “the  distance  from  the 
source  to  a  point  along  the  path  is  a  fraction  of  the  total 
path  distance.  ”  If  the  LTKB  contains  such  commonsense 
knowledge,  then  given  “John  drove  from  his  home  to  the 
store,  ”  a  SHRUTi-like  system  will  be  able  to  answer  all  the 
queries  of  the  form,  “Did  John  drive  a  third  of  the  distance 
between  his  home  and  the  store?” 

R2.4.  SHRUTI  and  the  LTKB  assumption.  Like  Bauer,  Mun- 

sat  also  expects  the  right  sort  of  behavior  from  a  reflexive 
reasoner.  He  feels  that  neither  SHRcrri  nor  any  other 
.system  based  on  the  LTKB-assumption  (see  Munsat)can 
embody  reflexive-rea.soning  ability.  We  think  this  is  too 
pessimistic  and  that  an  extended  SHRUTl-like  system 
would  be  capable  of  performing  the  sort  of  reasoning 
deseribeil  by  Mnnsat.  His  misgivings  arise  partly  from  the 
same  set  of  misniKlerstandings  that  led  Bauer  to  conclude 
that  (1)  ever\  thing  inferred  b\  siiKirn  has  to  be  repre- 
s»-nfed  explieitb'  in  the  LTKB,  and  (2)  there  is  no  distinc¬ 
tion  between  possible  and  plausible  inlerenc'c. 

It  is  snriirising  that  .Miinsat  wonders  how  the  right  rules 
and  laets  become  aeti\<‘  from  among  the  millions  of  ruh's 
and  fiets.  lieeanse  this  is  one  of  the  core  problems  SIIKU'IT 


addresses!  Perhaps  Munsat  wrongly  thinks  the  LTKB  is 
an  unstructured  set  of  propositions.  If  the  LTKB  were  an 
unstructured  set  of  propositions,  M unsat ’s  concerns 
would  certainly  be  appropriate.  In  SHRUTI,  however,  the 
rules  in  the  LTKB  are  highly  organized  and  form  an 
inferential  dependency  graph  in  which  they  are  direct 
(hardwired)  mappings  from  predicates  to  predicates  and 
provide  the  necessary  inferential  paths  for  the  automatic 
and  efficient  computation  of  inferences.  In  the  case  of  an 
input,  these  hardwired  mappings  lead  to  all  and  only  the 
possible  inferences  that  follow  from  the  input.  For  exam¬ 
ple,  the  input  “Sally  bought  a  Rolls-Royce”  would  physi¬ 
cally  cause  the  activity  “Sally  owns  a  car”  but  not  “the 
moon  is  a  satellite.” 

Munsat  also  worries  about  the  need  for  a  homunculus 
to  decide  what  makes  sense.  One  of  the  appeals  of 
connectionist  models  is  that  they  offer  an  alternative 
interpretation  of  what  it  means  to  “make  sense.”  These 
include  (related)  notions  such  as  reaching  a  locally  mini¬ 
mum  energy  state,  being  in  an  attractor  state,  or  forming  a 
stable  coalition.  These  are  related  notions,  and  in  the 
context  of  reasoning  they  correspond  to  activity  states 
where,  for  example,  the  cause-and-effect  relations  be¬ 
tween  active  predicates  are  mutually  reinforcing. 

Munsat  rightly  observes  that  people  can  tell  you  what 
would  have  to  be  the  case  for  a  story  line  to  make  sense. 
But  isn’t  this  exactly  what  abductive  reasoning  captures? 
So  given  “John  slipped  on  the  floor,  ”  an  abductive  rea- 
soner  might  come  up  with  the  hypothesis  “the  floor  might 
have  been  wet,”  and  “someone  might  have  mopped  the 
floor.”  Of  course,  the  LTKB  would  have  to  include  the 
commonsense  knowledge  that  the  floor’s  being  wet  can 
lead  to  someone  slipping  and  falling,  and  that  mopping 
the  floor  causes  it  to  be  wet.  But  this  is  exactly  the  sort  of 
commonsense  knowledge  we  would  expect  to  be  in  the 
LTKB  of  an  agent. 

About  the  joke  from  “Cheers.”  We  believe  that  a  rea¬ 
sonable  modeling  of  the  LTKB  of  an  agent  exposed  to 
popular  TV  fare  would  allow  the  modeling  of  the  joke  in 
question.  We  do  not  think  our  ability  to  understand  such 
jokes  implies  anything  magical  about  the  contents  of  our 
LTKB  or  our  reasoning  ability  -  at  least  not  in  any  way 
that  transcends  our  already  remarkable  ability  to  perform 
reflexive  reasoning. 

Carson  rightly  points  out  that  it  would  be  unrealistic  to 
assume  each  predicate  to  have  all  the  arguments  required 
for  accommodating  the  potentially  large  number  of  mod¬ 
ifiers  that  might  arise  in  various  situations.  The  problem 
can  be  solved  as  follows:  Predicates  are  assumed  to  “in¬ 
herit”  arguments  in  much  the  way  that  concepts  inherit 
attribute  values.  For  example,  arguments  such  as  location 
and  time-of-occurrcncc  ma>'  be  associated  with  the  gen¬ 
eral  predicate  event  and  not  replicated  in  predicates 
corresponding  to  more  specific  types  of  events  such  as 
sell.  When  the  sell  predicate  is  instantiated,  the  appropri¬ 
ate  rule  (the  one  that  encodes:  “.sell  is  an  event”)  will  lead 
to  an  instantiation  of  the  predicate  event,  (')nce  event  is 
active,  its  arguments  loeation  and  lime-of-occnrrenee 
wouhl  betxnne  available  and  may  be  bound  to  the  value  of 
location  or  time-of-occnrrence  pnn  ided  by  modifiers. 

R2.5.  Some  real  limitations.  Several  commentators  have 
pointed  out  some  real  limitations  in  the  expressive  power 
of  sill»"i  I  \  is-a-vis  rellexise  reasoning,  .■\lthongh  some  of 
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these  can  lie  readily  overcome,  others  would  reijiiire 
signiiicant  effort. 

Cottrell  (para.  4)  and  Carson  point  out  that  the  target 
article  does  not  deal  with  negation.  The  abductive  reason¬ 
ing  system  described  in  Ajjanagadde  (1991)  suggests  one 
way  of  handling  it.  Cottrell  points  out  another  promising 
alternative.  Referring  to  some  of  his  earlier  work,  Cottrell 
cautions  us  that  the  introduction  of  negation  might  slow 
down  some  of  the  computations.  The  difficulties  Cottrell 
refers  to,  however,  were  partially  due  to  his  use  of  default 
logic  (Reiter  1980)  as  a  framework  for  modeling  inheri¬ 
tance  with  exceptions.  We  have  argued  elsewhere  (Shastri 
1988a)  that  default  logic  is  not  the  appropriate  tool  for 
modeling  what  is  essentially  a  problem  of  evidential/ 
probabilistic  reasoning.  We  have  also  shown  that  an  evi¬ 
dential  treatment  of  this  problem  leads  to  a  c-onnectionist 
network  that  can  compute  inheritance  with  exceptions 
effectively  in  time  that  is  just  proportional  to  the  depth  of 
the  inheritance  hierarchy.  The  inclusion  of  negation 
would  support  the  encoding  ofrules  such  as  A(x)=^  -i  B(x) 
and  would  in  turn  allow  the  system  to  draw  inferences  of 
the  type  “John  is  not  taller  than  himself." 

Bamden  (para.  8),  Hummel  &  Holyoak  (para.  3),  and 
Carson  (para.  7)  point  to  an  important  restriction  on  the 
representation  of  dynamic  structures  in  SHRirri.  shruti 
can  only  represent  dynamic  structures  containing  first- 
order  bindings  -  namely,  the  fillers  of  arguments  in  a 
dynamic  structure  must  be  entities;  they  cannot  them¬ 
selves  be  dynamic  structures.  Note,  however,  that  an 
entity  can  be  a  complex  structure  as  long  as  this  structure 
is  static,  that  is,  built  out  of  hardwired  nodes  and  links. 

A  possible  way  of  expressing  higher-order  bindings  is  to 
use  a  richer  temporal  structure  than  the  one  used  in 
SHRUTI.  In  such  a  scheme,  first-order  bindings  would  be 
represented  by  very  fine  synchronization  using  short 
cycle  times  and  narrow  windows  of  synchrony,  while 
higher-order  bindings  would  be  represented  by  coarse 
synchronization  using  long  cycle  times  and  wider  win¬ 
dows  of  synchrony.  Koerner  seems  to  be  advocating  such 
a  multilevel  temporal  representation.  The  problem  with 
this  approach,  however,  is  that  it  can  lead  to  complex  and 
potentially  unstable  activity  (see  Koerner). 

We  believe  that  reflexive  reasoning  primarily  involves 
first-order  bindings  and  many  problems  that  seem  to 
require  higher-order  bindings  can  be  reformulated  so  as 
to  require  only  first-order  bindings.  For  example,  con¬ 
sider  the  representation  of  the  nested  structure:  go(John, 
pat h(at( home))),  which  may  be  read  as  “John  went  on  a 
path  that  led  to  his  home.  ”  A  dynamic  representation  of 
this  structure  might  appear  to  require  a  third-order  bind¬ 
ing  for  the  second  argument  of  go.  Such  a  nested  struc¬ 
ture,  however,  can  be  expressed  as  a  dynamic  structure 
involving  only  first-order  bindings  by  assuming  that  an 
instantiation  of  go  creates  a  flat  dynamic  structure  via  the 
“rule": 

V  x.l/iirig,  i/.f/iing  p,o(x,y)  3  p.paOi,  l.locatum 
go'ix.p)  A  to'(j).l)  A  at'il.ijT* 

which  says  that  go(x,ij)  means  that  there  exists  a  path  p 
and  a  location  /,  such  that  I  is  "at  y,"  p  is  the  path  to  I.  and  .v 
gor.v  on  p.  (Additional  rules  involving  the  predicates  go', 
/o',  and  at'  would  specify  the  meanings  of  the  pretlieates 
go',  /o',  and  «/'.) 

In  the  target  article  w('  had  conjectured  that  our  ability 


to  incorporate  novel,  ruli'like  iniorination  during  la-llex- 
ive  reasoning  may  be  extremely  limited.^  Although  Barn- 
den  does  not  reject  this  claim  outright,  he  expresses  some 
doubts  and  offers  a  eounterexample.  But  Barnden's  is  not 
necessarily  a  eounterexample.  As  stated  in  section  5, 
paragraph  7,  the  use  of  types  allows  certain  ndelike 
information  to  be  expressed  as  a  "fact.  "  .Specifieally,  if  tin- 
“rule”  involves  only  unary  predicates  in  the  antecedent 
then  it  can  be  expressed  as  a  iact.  Thus,  instead  of 
expressing  “Everyone  at  the  party  was  a  toothbrush  sales¬ 
person"  as  a  rule  Vx  at-party(x)  ^  sel[s(x,  Toothhru.sh) 
one  could  express  it  as  sells(at-purty.  Toothbrush),  where 
at-partij  just  refers  to  the  set  of  (reople  who  were  at  a 
particular  party.  Barnden’s  example,  however,  does  high¬ 
light  that  a  reflexive  reasoner  should  be  capable  of  refer¬ 
ring  to  dynamic  “sets  ”  such  as  "the  people  at  a  speeific 
party.  ” 

R2.6.  Relation  between  reflexive  and  reflective  reasoning. 

Several  questions  are  posed  by  Ohisson  and  Martin 
concerning  the  relation  between  reflexive  and  reflective 
reasoning;  some  answers  are  proxided  by  Oaksford  & 
Malloch  and  Hampson. 

Shift  from  reflective  to  reflexive.  As  suggested  in  the 
target  article,  rules  that  participate  in  reflexive  reasoning 
must  be  integrated  into  the  LTKB  by  being  embedded  in 
the  inferential  depcmdency  graph.  This  integration  is 
expected  to  be  a  slow  process  retjuiring  repeated  experi¬ 
ence  or  observation.  Hampson  (para.  2)  offers  .some  sup¬ 
porting  evidence  and  points  out  that  this  is  cxuisistent  with 
the  general  view  that  practice  shifts  prtK'cssing  in  the 
direction  of  automaticity  (also  see  Strong).  However, 
SHRUTI  also  predicts  that  rules  whose  form  violates  the 
restriction  stated  in  section  4,9  cannot  become  part  of  a 
reflexive  process. 

Reflexive  and  reflective  reasoning  are  not  disjoint  pro¬ 
cesses  that  use  disjoint  repre.sentations  and  mechanisms. 
We  think  that  reflexive  reasoning  is  our  primary  and  basic- 
reasoning  mechanism.  Reflective  reasoning  involves  a 
combination  of  reflexive  reasoning  and  additional  mecha¬ 
nisms  and  representations.  These  would  include  an  atten- 
tional  mechanism  for  “remembering”  a  small  number  of 
input  or  inferred  facts  teinporarih'.  In  other  words,  we 
will  reejuire  an  overt-STM  that  might  very  well  corre¬ 
spond  to  the  usual  notion  of  a  working  memory  (Baddelev 
1986). 

Ohisson  wonders  why  agents  use  reflective  reasoning  if 
reflexive  reasoning  is  so  efficient.  The  answer  is  ciuite 
straightforward:  Agents  resort  to  reflective  reasoning  be¬ 
cause  they  must.  If  the  amount  of  dynamic  memory 
reejuired  for  solving  a  problem  exceeds  the  WM  RR  capac¬ 
ity,  if  the  depth  of  reasoning  recpiired  to  .solve  a  problem 
exceeds  the  depth  bound  of  reflexive  reasoning,  or  if  the 
form  of  rules  reejuired  for  reasoning  violates  the  form 
constraint,  the  agent  will  have  to  resort  to  reflective 
reasoning  and  use  conscious  deliberation,  props,  and/or 
other  external  representations  (see  Oaksford  &  Malloch, 
paras.  6-7). 

R3.  Paradigmatic  issues 

R3.1.  Distributed  representations:  The  magical  alterna¬ 
tive.  The  magical  i)owi-rs  of  distributed  represi-ntations 
arc*  invokc-d  by  Carson  to  suggest  that  our  work  is  mis- 


181 


BFHAVIOnAI  AND  BRAIN  SCIFNCFS  (1993)  1G.3 


directed.  Yet  none  of  the  existing  models  based  on  distrib¬ 
uted  representations  come  anywhere  close  to  demon¬ 
strating  the  expressiveness,  inferential  adequacy,  and 
scalability  of  SHHUTI.  We  hope  that  the  proponents  of 
distributed  representations  will  recognize  that  a  distrib¬ 
uted  system  -  at  least  in  its  pristine  form  -  cannot  have 
the  necessary  combination  of  expressiveness,  inferential 
adequacy,  and  scalability.  As  Hummel  &  Holyoak  point 
out,  there  is  a  basic  tradeoff  between  distributed  repre¬ 
sentation,  systematicity,  and  parallelism;  no  amount  of 
handwaving  can  make  this  tradeoff  disappear. 

A  system  using  a  distributed  representation  for  argu¬ 
ments  and  fillers  can  only  represent  one  dynamic  binding 
at  a  time.  How  does  Carson  expect  such  a  system  to 
perform  rapid  reasoning  (or  parsing)  within  the  desired 
time  scale?  It  should  not  come  as  a  surprise  that  DCPS  and 
TPPS,  two  systems  based  on  distributed  representations, 
were  serial  at  the  mowledge  level  and  could  only  apply 
one  rule  at  a  time. 

Rohwer  recognizes  the  advantage  of  using  temporal 
synchrony,  suggesting  that  to  utilize  space  in  an  optimal 
manner  one  should  use  temporal  synchrony  in  combina¬ 
tion  with  distributed  representation.  But  by  using  only 
temporal  synchrony  and  interleaved  node  activity,  a  dis¬ 
tributed  representation  system  can  only  represent  a  small 
number  of  dynamic  bindings.  Hence  the  “optimal”  use  of 
space  will  mean  giving  up  the  ability  to  represent  a 
large  number  of  dynamic  bindings  simultaneously  and 
knowledge-level  parallelism. 

As  Hummel  &  Holyoak  point  out,  shruti  also  uses 
“distributed  representations.”  An  n-ary  predicate  is  rep¬ 
resented  by  a  collection  of  n  -I-  2  nodes  and  hence  a 
dynamic  fact  is  a  pattern  of  activity  distributed  over 
several  nodes,  shruti  does  use  a  localist  representation 
of  arguments  (note,  however,  that  although  each  role  is 
localized  in  the  abstract  representation,  it  is  physically 
distributed,  because  it  is  represented  by  a  cluster  of  cells). 
The  (abstract)  localization  of  roles  is  essential  in  any 
system  that  must  represent  a  large  number  of  dynamic 
bindings  simultaneously.  Indeed,  it  is  their  localization 
that  enables  shruti  to  represent  and  propagate  simul¬ 
taneously  a  large  number  of  dynamic  bindings  and  to 
exhibit  knowledge-level  parallelism. 

As  far  as  entities  are  concerned,  the  encoding  of  an 
entity  can  be  viewed  as  a  distributed  pattern  over  the 
collection  of  nodes  that  make  up  the  type  hierarchy.  If  one 
augments  the  representation  of  types  (concepts)  with 
attribute  values  (see  Shastri  1988a;  Sliastri  &  Feldman 
1986)  then  the  “distributed”  nature  of  the  rep.  esentation 
of  each  entity  becomes  even  more  apparent.  Observe  that 
the  key  to  encoding  similarity  is  the  use  of  shared  repre¬ 
sentation  and  it  is  this  sharing  that  gives  distributed 
representations  the  ability  to  capture  similarity.  The  type 
hierarchy  also  leads  to  such  a  sharing  of  representation 
and,  hence,  allows  SHRtrri  to  capture  similarity. 

We  agree  with  Dorffner  that  a  reflexive-reasoning 
system  should  have  a  more  fluid  and  dynamic  view  of 
coinpositionality.  It  should  he  capable  of  zooming  in  and 
out  over  representations  efldrtlessly  at  diflercnt  levels  of 
granularity  and  of  int<-rprcling  a  situation/inpuf  relative 
to  its  current  goals  (re:  Dorffner’s  example  of  our  inter¬ 
pretation  shifting  from  “a  hlol).  "“a  kulder.”  to  "a  chair  on  a 
table  ).  Blit  we  fail  to  s«‘c  what  this  ability  has  to  do  with 
distrihiited  representations  per  sc'.  DoHfner  also  advo- 
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cates  a  flexible  interpretation  of  roles.  In  a  sense,  shruti 
already  exhibits  such  flexibility.  Consider  the  activity 
resulting  from  the  input  “John  owns  a  car.  ”  This  input  will 
result  in  the  roles  “owner”  and  “potential-seller”  firing  in 
synchrony  with  John.  One  can  view  this  activity  over  the 
role  nodes  as  the  distributed  pattern  corresponding  to  the 
“soft”  role  being  filled  by  John.  Now  imagine  that  there 
are  certain  types  of  objects,  say  foo,  that  can  be  owned  but 
not  sold.  This  knowledge  would  be  encoded  as  the  appro¬ 
priate  type  restriction  on  the  rule  between  own  and  can- 
sell.  Now  if  we  present  the  input  “John  owns  a  foo,”  the 
resulting  activity  will  only  involve  the  role  “owner”  firing 
in  synchrony  with  John.  Thus  in  this  situation,  John  can 
l>e  viewed  as  filling  a  different  role  given  by  a  different 
pattern  of  activity  over  the  role  nodes. 

Halford  offers  a  comparison  of  the  tensor  product 
approach  (TPA)  to  dynamic  bindings  and  the  approach 
used  in  shruti.  We  welcome  the  comparison  but  disagree 
with  some  of  the  specifics.  For  example,  Halford  says  that 
the  TPA  representation  of  R(a,b,c)  also  represents  the 
influence  of  c  on  /I(a,fe).  But  even  SHRUTi’s  representation 
has  a  similar  ability.  Consider  the  representation  of 
give(John,  Mary,  x)  and  the  inferences  that  would  follow 
from  this  partially  instantiated  relation.  Now  imagine 
introducing  the  binding  (g-obj  =  a-valentine)  in  the  above 
relation  instance  resulting  in  give(]ohn,  Mary,  a-valen¬ 
tine).  A  number  of  additional  inferences  would  now  fol¬ 
low.  Would  these  additional  inferences  not  denote  the 
effect  of  adding  “a-valentine”  to  give(John,  Mary,  x)P 
Halford  also  suggests  that  given  R(a,b,c),  it  is  meaningful 
to  talk  about  R(a,b)  in  TPA.  But  does  the  ability  to  deal 
with  partially  instantiated  relations  not  confer  the  same 
power  on  shruti?  Finally,  Halford  indicates  that  TPA 
supports  the  retrieval  of  any  argument  filler  given  the 
predicate  and  the  remaining  argument  fillers.  This  seems 
to  correspond  to  shruti’s  ability  to  answer  wh-queries 
(see  sect.  4.7). 

R3.2.  SHRUTI  and  the  classical  approach.  Several  commen¬ 
tators  see  the  ghost  of  classical  AI  in  our  model  (Dawson  & 
Berkeley,  Dorffner,  and  Carson).  Dawson  &  Berkeley 
also  see  shruti  as  a  mere  implementation  of  classical 
ideas.  Our  response  has  two  parts.  First,  we  believe  that 
any  model  of  cognition  will  have  to  exhibit  some  of  the 
functionality  identified  by  the  classical  approach.  We 
cannot  simply  discard  the  notions  of  systematicity  and 
coinpositionality  -  what  we  need  to  do  instead  is  discard 
the  view  that  systematicity  and  compositionality  have  to 
be  retained  in  their  unconstrained  and  unfettered  form. 
The  interesting  challenge  is  to  determine  the  appropriate 
form  and  extent  of  systematicity  and  compositionality  that 
cognitive  models  must  support.  If  we  draw  the  line  too  far 
to  the  left,  we  can  end  up  with  Inith  a  type  of  associationist 
glob  that  opponents  of  cxmnectionism  love  to  attack  or 
models  that  work  on  toy  examples  but  do  not  seem  to  have 
any  hope  of  scaling  to  larger  problems.  If  we  draw  the 
boundary  foo  far  to  the  right  we  can  end  up  with  attempts 
at  building  “eonneetionist”  machines  for  doing  list  pro¬ 
cessing  -  an  interesting  exerci.se,  hut  lacking  any  cogni¬ 
tive  significance  (Touretzky  1990).'* 

As  Feldman,  Ohlsson,  Martin,  Strong,  and  Oaksford 
&  Malloch  |)oint  out,  ours  is  a  different  approach.  We  an' 
trying  to  build  a  model  of  reflexive  reasoning  that  respi'cts 
the  <-ss<>ntial  constraints  imiiosed  l)\  the  underlying  coni- 
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putational  architecture  but  that  at  the  same  time  has  (1) 
considerable  representational  and  inferential  power,  (2)  a 
limited  yet  potentially  adequate  ability  to  deal  with  sys- 
tematicity  and  compositionality,  and  (3)  requisite  scaling 
power. 

The  comment  that  SHRUTI  is  a  mere  implementation  of 
the  classical  approach  simply  misses  the  point.  Ohlsson 
(para,  8),  Martin  (paras.  9-10),  and  Oaksford  &  Malloch 
(paras.  1-4)  spell  out  the  relation  between  SHRUTi  and  the 
classical  approach.  The  key  observation  is  that  our  “imple¬ 
mentation”  leads  to  a  set  of  constraints  and  predictions 
about  the  nature  of  reflexive  processing  that  are  unique  to 
this  implementation. 

Dawson  &  Berkeley  offer  three  reasons  why  our  model 
is  a  classical  one.  We  have  already  responded  to  their 
comments  about  biological  plausibility  in  Rl.lO  and  we 
will  respond  to  their  claim  that  our  system  has  a  central 
controller  in  R5.3.  Let  us  briefly  comment  on  the  “squig¬ 
gly  line"  Dawson  &  Berkeley  point  to  as  the  “smoking 
gun”  that  proves  our  system  is  a  classical  rule-based  one! 
The  squiggly  line  in  Figure  19  was  drawn  to  help  the 
reader  delineate  the  representations  introduced  in  sec¬ 
tions  3-4  from  those  introduced  in  section  5.  By  no  stretch 
of  imagination  does  this  line  separate  the  “data  structures 
being  processed”  from  “the  rules  governing  system  infer¬ 
ences.  ”  Both  the  type  hierarchy  and  the  rule  base  embody 
some  data  and  some  processing  in  the  traditional  sense  of 
these  terms.  We  strongly  urge  Dawson  &  Berkeley  to 
reread  section  3.4,  paragraph  5,  and  the  article  by  Hat¬ 
field  (1991)  cited  therein. 

R3.3.  On  the  Al  paradox.  Our  claim  that  we  have  taken  a 
step  toward  resolving  the  AI  paradox  (see  Abstract)  is 
contested  by  Hdlldobler,  yet  nothing  in  his  commentary 
contradicts  the  basis  for  our  claim,  namely,  that  work  in  Al 
has  not  offered  a  credible  account  of  how  humans  can 
rapidly  perform  a  wide  range  of  reasoning  in  time  that 
does  not  seem  to  increase  with  the  size  of  their  knowledge 
base.  The  results  in  AI  have  either  been  negative  and 
shown  that  even  very  “simple”  types  of  reasoning  are 
intractable,  or  they  have  offered  characterizations  of 
“complex”  reasoning  classes  that  require  too  much  space 
or  time,  or  produced  positive  results  that  are  about  overly 
restrictive  forms  of  reasoning  (see  sect.  9  for  references). 

So  if  our  predictions  about  reflexive  reasoning  were  to 
hold,  we  would  indeed  have  taken  a  step  toward  resolving 
the  AI  paradox  by  showing  that  there  exists  a  class  of 
reasoning  that  can  be  performed  with  requisite  efficiency 
and  that  is  powerful  enough  to  cover  a  significant  range  of 
reasoning  that  people  can  perform  reflexively. 

Hdlldobler  graciously  observes  that  our  “logic”  has 
some  remarkable  features  but  he  remarks  that  its  expres¬ 
sive  power  is  “fairly  limited”  from  a  “logical  point  of  view." 
He  does  not  seem  to  realize  that  the  fundamental  issue  is 
not  how  powerful  or  weak  reflexive  reasoning  is  from  a 
logical  point  of  view.  If  it  turns  out  that  such  reasoning 
corresponds  to  a  "simple”  logic,  so  be  it!  Hdlldobler  seem;' 
to  want  to  hold  us  responsible  for  Al  re.searchers'  failure  to 
iinestigate  “simpler”  logics. 

We  appreciate  Holldobler's  pointers  to  related  work  on 
automated  theorem  proving.  Note,  howevi-r,  that  the 
rr-sult  I  hilldohlcr  discusses  involves  a  stronger  restriction 
on  the  lorm  oi  rules  than  what  we  impose  in  s<-ction  1.9. 
Our  restriction  concerns  oiiK'  rariables  that  occur  imnr 


than  once  in  the  antec-edent.  Hdlldobler  mentions  a 
stronger  restriction  that  covers  all  variables  occurring  in 
the  antecedent. 

We  are  not  surprised  by  Holldobler’s  observation  that 
as  far  as  deductive  reasoning  is  concerned,  SHHi/ri's 
inferential  power  is  a  special  case  of  some  more  general- 
purpose  theorem  prover.  This  observation  is  of  some 
value,  but  such  a  posteriori  analysis  should  not  be  mis¬ 
taken  for  the  actual  identification  of  an  interesting  and 
significant  special  case  of  a  general  problem. 

R3.4.  Do  static  bindings  suffice?  It  is  argued  by  Cooper 
that  dynamic  bindings  may  not  be  relevant  because  rule- 
based  reasoning  may  be  the  wrong  paradigm  for  modeling 
intelligence.  As  pointed  out  in  the  target  article  and 
in  R2.2,  however,  the  dynamic-binding  problem  tran¬ 
scends  a  narrow  reading  of  “rule-based  reasoning. 
Cooper  refers  to  case-based  reasoning  (CBR)  and  implies 
that  CBR  may  not  require  dynamic  bindings!  A  little 
reflection,  however,  should  make  it  clear  that  using  a  casi‘ 
would  require  binding  (on  the  fly)  its  roles/slots  to  the 
appropriate  entities  in  the  current  situation.  Further¬ 
more,  any  but  a  trivial  CBR  system  would  have  to  propa¬ 
gate  some  of  these  bindings  in  order  to  solve  the  inde.xing 
problem.  Cooper  argues  that  a  full-fledged  treatment  of 
n-ary  predicates  may  be  unnecessary  and  counterproduc¬ 
tive.  We  agree!  Indeed,  shruti  does  not  offer  such  a 
universal  treatment  (sec  sects.  4.9,  6,  and  8). 

Cooper  also  suggests  that  we  need  only  solve  the 
binding  problem  for  feasible  pairing  of  roles  and  fillers; 
because  the  number  of  feasible  role-filler  pairing  is  not 
astronomical,  it  may  be  possible  to  dedicate  nodes  to  each 
of  the  feasible  bindings.  He  concludes  according!)'  that  it 
may  be  possible  to  get  by  without  dynamic  bindings.  He 
seems  to  be  making  a  crucial  error,  however,  because  a 
system  must  not  only  deal  with  feasible  but  also  nonfeas- 
ible  ones.  Consider  the  sentence  ‘The  Grand  Canyon 
gave  a  computer  to  a  monkey!  ”  Sure!)’  the  bindings 
between  giver  and  Grand  Canyon  are  not  feasible  in 
Cooper’s  sense  of  the  word.  Yet  we  have  no  trouble  in 
creating  this  binding  and  answering  questions  about  who 
gave  what  to  whom.  So  we  are  quite  capable  of  repre¬ 
senting  essentially  arbitrary  pairings  between  concept/ 
instances  and  conceptual  roles  without  requiring  repeti¬ 
tion,  attention,  and  reflection. 

R4.  Learning 

A  number  of  commentators  point  out  that  we  have  not 
investigated  learning  in  detail.  We  agree.  We  also  agree 
with  Martin  that  pursuing  learning  within  SHRcrri  will 
provide  an  additional  set  of  constraints  that  may  lead  to 
further  insights  into  the  nature  of  reflexive  reasoning.  In 
the  target  article  we  mt'iitioned  that  we  have  a  plausible 
solution  to  the  problem  of  one-shot  learning  of  facts.  We 
are  also  pursuing  the  problem  of  incremental  rule  learn¬ 
ing  (see  below).  As  Hanipson  points  out,  the  integration  of 
rules  into  the  LTKB  can  be  a  slovs'  and  grailual  proe<>ss 
reipiiring  eonsidmable  exposure  to  a  variet)  of  n  levant 
situations.  This  is  to  be  expeeti'd.  because  to  learn  a  rnk' 
invob cs  learning  the  eorreet  argument  mapping  as  \\  ell  ;is 
the  as.soeiated  o  (unetions  (see  sect.  .5. .51  to  embod\  the 
appropriatmiess  of  these  mappings  {(Jottrell  a|)tK  r<‘lers 
to  the  ir  fnnetions  as  "sem;mtie  filters  ). 
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Crossberg  summarizes  work  done  by  him  and  his 
colleagues  on  a  family  of  learning  systems  that  can  extract 
rules  firom  data.  We  would  like  to  evaluate  the  power  of 
these  systems  in  the  context  of  a  SHRUTl-like  reflexive¬ 
reasoning  system. 

Cottrell  argues  that  using  the  pattern  containing  infer¬ 
ence  alternative  (PCIA)  discussed  in  section  9.4  will  lead 
to  a  number  of  advantages  in  learning  rules  and  semantic 
filters  associated  with  rules.  He  does  not  realize  that  the 
temporal  synchrony  approach  used  in  SHRUTi  can  support 
all  the  advantages  he  cites.  This  is  illustrated  in  Figure 
Rl,  which  makes  it  clear  that  semantic  filters  as  well  as 
covariance  constraints  can  be  learned  within  the  temporal 
synchrony  approach  (a  more  detailed  description  of  this 
approach  will  appear  in  Shastri  1993b).  Figure  Rla  shows 
three  groups  of  nodes.  The  one  on  the  bottom  left  is  the 
collection  of  all  the  predicate  nodes.  For  simplicity,  we 
have  only  shown  role  (argument)  nodes  of  predicates  and 
omitted  the  enabler  and  collector  nodes.  The  group  on 
the  right  consists  of  all  the  type  or  feature  nodes.  As 
discussed  in  R3.1,  the  representation  of  each  entity  can 
be  viewed  as  a  pattern  of  activity  over  the  collection  of 
type  or  feature  nodes.  The  collection  of  nodes  on  the  top  is 
the  “hidden”  structure  consisting  of  a  layer  of  T-or  nodes 
sandwiched  between  two  layers  of  p-btu  nodes.  The 
arrows  indicate  that  nature  of  connectivity.  Notice  that 
the  role  and  feature  nodes  feed  into  the  bottom  layer  of 
the  hidden  structure  and  the  top  layer  of  the  hidden 
structure  feeds  back  into  the  role  nodes.  The  interconnec¬ 
tion  pattern  in  the  hidden  structure  is  as  shown:  the 
bottom  layer  feeds  into  the  second  and  third  layer  and  the 
second  layer  feeds  into  the  third  layer.  The  proposed 
connectivity  can  be  shown  to  support  the  learning  of 
type^feature  preferences/ restriction  involving  individual 
roles  as  well  as  multiple  roles. 

Figure  Rib  shows  the  input  activity  for  a  particular 
situation  (John  walked  into  the  wall)  and  the  associated 
target  activity  (John  got  hurt).  The  role  nodes  will  be 
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Figure  Rib.  The  input  and  target  patterns  for  the  training 
situation:  “John  walked  into  a  wall"  (antecedent),  “John  got  hurt" 
(consequent). 

clamped  to  the  input  pattern  shown  in  the  figure  (the 
input  need  not  be  periodic)  and  the  desired  behavior  of 
the  network  would  be  the  (suitably  delayed)  activation  of 
the  patient  role  of  hurt  as  specified  in  the  target  pattern. 
One  could  use  a  suitable  learning  algorithm  to  learn  the 
correct  weights  in  the  network  to  encode  the  necessary 
rules  with  the  appropriate  semantic  filters.  The  above 
interconnection  pattern  should  also  make  it  clear  that 
Contrary  to  Cottrells  (para.  7)  and  Carsons  (para.  6) 
suggestions,  the  learning  of  rules  does  not  require  one-to- 
one  connectivity  between  all  predicate  role  nodes. 

R5.  Miscellaneous  issues 

RS.1.  Grounding.  As  pointed  out  by  Diederich,  we  have 
not  addressed  hiw  the  meaning  of  our  representation  is 
ultimately  grounded  (Hamad  1990).  We  recognize  that 
grounding  is  central  to  the  notion  of  representation,  but 
our  concern  in  this  target  article  has  been  with  issues  such 
as  expressive  power,  inferential  adequacy,  and  scalability 
of  a  biologically  plausible  representation  and  reasoning 
system.  We  will  have  to  face  the  issue  of  grounding  if  we 
want  to  start  ascribing  real  (not  imputed)  meaning  to 
nodes  and  circuits  in  SHRUTI. 

R5.2.  Encoding  of  long-term  facts  and  the  IS-A  hierarchy. 

In  paragraph  2  Strong  argues  that  the  encoding  of  a 
partially  instantiated  fact  like  give(}ohn,  Susan,  x)  vio¬ 
lates  the  closed  word  assumption  (CWA).  He  writes  that 
given  the  fact  give(John,  Susan,  x)  the  CWA  implies  the 
answer  to  the  question  giv<i(John,  Susan,  CarZ)  should  be 
no.  We  agree;  indeed,  this  is  exactly  how  the  system 
responds.  So  we  do  not  see  vvliy  he  believes  tliat  sURUTi’s 
resiK)n.se  is  inconsistent  with  the  CWA.  Strong  is  right, 
however,  about  the  encoding  of  a  partially  instantiated 
fact  using  th(‘  IS-A  hierarchy.  Tlie  reasoning  system  aug¬ 
mented  witli  the  IS-A  liierarchy  e.  codes  a  fact  such  as 
3x:Thiu"  (iiv<’(Jolin,  Susan,  xj  exactly  as  he  descrilu's. 
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Palm  seems  to  be  confusing  long-term  facts  with 
medium-term  facts.  The  intended  function  of  what  we 
have  called  “long-term  facts”  is  indeed  the  essentially 
permanent  recording  of  a  situation  (a  static  set  of  bind¬ 
ings).  Some  of  this  cx)nfusion  can  be  resolved  by  recogniz¬ 
ing  that  (1)  only  situations  that  are  significant  end  up 
being  encoded  as  long-term  facts  and  (2)  long-term  facts 
are  not  entirely  “forgotten”  when  the  situation  they  en¬ 
code  ceases  to  be  true.  In" '^cad,  they  sre  tagged  in  some 
manner  to  indicate  that  jy  are  no  longer  the  case.  To 
draw  an  analogy,  when  long-term  facts  oe.ise  to  be  true, 
they  do  not  just  disappear;  they  continue  to  be  around  as 
“ex-long-term  facts." 

Palm  seems  to  have  misinterpreted  the  encoding  of 
long-term  facts  in  Figure  12,  inferring  that  the  enabler, 
collector,  arid  argument  nodes  are  duplicated  for  each 
long-term  fact.  This  is  not  the  case.  As  we  explained  in 
section  3.3,  for  each  n-ary  predicate  there  is  only  one 
enabler,  one  collector,  and  n  argument  nodes.  All  the 
long-term  facts  pertaining  to  this  predicate  share  these 
“general"  nodes.  So  in  Figure  12  if  we  were  to  add  the 
long-term  fact  buy( Jackie,  Car?)  we  would  only  add  one 
additional  node,  namely,  a  fact  node. 

The  above  confusion  also  leads  Palm  to  think  that 
multiple  predicate  banks  introduced  in  section  6  are 
required  for  storing  multiple  long-term  facts  (end  of  para. 
8).  This  is  not  the  case.  It  turns  lut  that  multiple  predicate 
banks  have  been  posited  for  representing  multiple  dy¬ 
namic  instantiations  of  a  predicate,  not  multiple  long¬ 
term  facts. 

The  suggestions  by  Palm  about  encoding  IS-A  and 
predicate  hierarchies  via  set  containment  are  well  taken. 
We  see  two  potential  problems  with  his  proposal:  (I) 
representation  of  multiple  dynamic  predicate  instances 
and  (2)  encoding  of  exceptional  properties/features  of 
concepts.  Palm  also  comments  about  encoding  soft  rules 
and  the  potential  problem  with  using  rate  of  firing  to 
encode  confidence.  He  seems  to  have  overlooked  the 
discussion  in  section  5.5  and  note  26. 

R5.3.  SHRun  and  a  central  controller.  Overhasty  dismis¬ 
siveness  seems  to  have  led  Dawson  &  Berkeley  and 
Touretzky  &  Fahiman  to  confuse  a  simple  means  of 
communicating  a  query  to  SHRUTI  and  recovering  the 
answer  with  a  central  controller  (see  paras.  2  and  10, 
respectively).  They  do  not  seem  to  understand  that  unless 
we  develop  a  complete  system  which  accepts  sensory 
(speech/visual)  input  and  producc^  ch/motor  out¬ 

put,  we  need  to  specify  a  way  of  c<;  '  iting  with  the 
system.  We  fail  to  see  how  posing  to  shruti  by 

activating  the  argument  and  filler  nodc.s  and  the  enabler 
of  the  query  predicate  and  thereafter  waiting  for  the 
collector  node  of  the  query  predicate  to  become  active 
can  be  confused  with  the  NETL-like  requirement  that  a 
central  controller  direct  the  activity  of  each  node  at  each 
step  of  the  computation! 

Touretzky  &  Fahiman  misconstrue  our  reasonable  con¬ 
jecture  that  for  linguistic  input  the  phase  separation  in  the 
activity  of  distinct  entities  might  begin  during  the  parsing 
process,  which  was  made  in  the  context  o.  di.sciissinghow' 
the  activ  ity  in  a  sv  stem  might  self-organi/.e  .so  that  each 
distinct  entity  starts  firing  in  a  distinct  phase.  Ourctmjec- 
tnre  is  interpreted  as  our  somehow  hiding  the  central 


controller  in  a  parser!  They  also  tail  to  rt-txjgnize  that 
instead  of  a  central  controller  SHRim  uses  "distribiitetl 
control  mechanisms”  as  part  of  its  representational  ma¬ 
chinery.  As  explained  in  section  9.1,  these  mc'clianisins 
obviate  the  need  for  a  central  controller  that  directs  tin- 
activity  of  every  node  at  eac-h  tinje  step. 

NOTES 

1.  SHRUTI  is  nut  an  acronym  but  a  Sanskrit  work  whK'li  refers 
to  the  oral  tradition  of  cuininunk-ating  knuwk-dge. 

2.  Bienenstuck  (personal  coniinunication)  had  also  ad*,  oeated 
the  use  of  aperiodic  synchronous  activits  over  periodk-  actiiilv 

3.  Ladu  et  al.  (1992)  report  finding  synchronous  oscillations 
in  the  motor  and  sensory  cortices  during  the  execution  of  simple 
hand  movements  by  analyzing  magneto-encephalograpiiy  data 

4.  As  argued  in  Thoqx,-  and  Imlx-rt  (1989),  thi-re  are  alxnit 
100,000  distinct  objects  that  can  Ik-  named  rapidly  fiy  {x-ople 
Hence  the  numlier  of  potential  argument  fillers  is  going  to  Ik-  at 
least  100,000  if  not  mure. 

5.  Note,  however,  that  if  one  is  only  seeking  local  rather  than 
global  consistency  then  it  is  possible  to  seek  local  support  and 
find  a  locally  consistent  hypothesis  in  a  reflexive  manner.  An 
example  of  this  may  lx-  found  in  the  abductive  reasoning  system 
described  by  Ajjanagadde  (1991). 

6.  There  are  notable  exceptions,  such  as  the  approach  ex¬ 
pounded  by  Newell  (1990;  see  alst)  multiple  b<x)k  review,  BBS 
15(3)  1992).  Feldman,  although  not  a  cognitive  psy  chologist,  has 
long  emphasized  that  the  biological  architecture  places  strong 
computational  constraints  on  the  nature  of  cognitive  imxlels,  A 
good  example  of  this  is  his  well-known  "hundred  step"  argunu-nl 
(Feldman  &  Ballard  1982).  [See  also  Feldman’s  "Four  Frames 
Suffice"  BBS  8(2)  1985.] 

7.  In  the  case  of  a  query,  the  jKissible  inferences  corresixmd 
to  the  possible  derivations  of  the  tiuery.  As  in  the  forward  case-, 
only  derivations  that  do  not  violate  W’MRR  capacity  and  depth 
bounds  are  possible. 

8.  These  types  of  rules  have  Ix-en  proposed  for  linking  syntac¬ 
tic  structures  with  conceptual  structures  within  a  SHRLTi-likc 
framework  (Shastri  1992). 

9.  It  is  possible  to  incorporate  certain  types  of  rules  into  our 
behavior  quite  rapidly.  Consider  "hit  the  left  button  if  you  see  an 
X  on  the  screen.”  Such  rules,  however,  seem  to  involve  a  fairly 
direct  mapping  between  perception  and  action. 

10.  This  would  be  a  much  longer  time  than  the  time  a  fact 
may  stay  active  in  the  WMRR  via  temporal  synchrony  but  much 
shorter  than  the  time  a  fact  may  stay  in  mc*dium-terin  memors. 

11.  The  use  of  distributed  representations  and  coarse  coding 
by  a  model  does  not  imply  that  the  model  is  c-ognitively  sig¬ 
nificant. 


References 

Letters  a  and  r  appearing  before  authors’  initials  refer  to  tarj^et  article  and 

response  respectively. 

Aarons<in.  j.  (iiWl/  Dynamic  fact  commiinicatitm  mcchinism.  .^  txHjmtiHMijsi 
interface.  Proceedinf*s  of  the  Thirteenth  Conference  of  the  Cofinitiie 
Science  Society.  fCrihaum.  (aLS| 

Alieles,  M.  (19ft2)  I^kqI  cortical  circuits.  Studies  of  brain  futtrlion.  vol  fi 
SprinRer.  (arLSj 

(1991)  Corticonics:  Seural  circuits  of  the  cerebral  cortex  ( ^ainhrulRf 
University  Press.  lal.S.  \\  jFl 

AjjanaRadde,  (•.  (1990)  H<.'a.soninR  with  function  smhImiIs  in  a  ('t>nn<‘ctioinsl 
system.  Proeeedinns  of  the  Twelfth  Conference  of  the  Coiinitue  Setenee 
Society.  Frlhauni  [aLS] 

(1991)  AlKliictive  rt'asonitiR  in  coinu'ctionist  networks  Ineori'<*>‘dtintt 

variables.  bac'kRrontxl  knowletlue.  and  stnutiired  explanaiula  Teehnual 
Rejtorl  \\'ST9l-7.  \Vilhelin-S<'hiekart!-Institiite.  rniversitv  of  liiliimicn. 
<iernian\  |arl..,S.  (.‘11.  D.ST] 


4HH 


RFHAVIORAL  AND  BRAIN  SCIFNCES  (1993)  16  3 


A</efvnce«/Shastri  &  Ajjanagadde:  Association  to  reasoning 


AjjaiuiitMlde.  V.  G.  di  Shastri,  L.  (1989)  Efficieiit  inferenoe  with  mulciplace 
predicates  and  variables  in  a  coooectiooist  system,  froowdiiiga  of  the 
EUventh  Conference  of  the  CogniUoe  Science  Society.  Eiibaum.  (aLSl 
Allen.  J.  F.  (1987)  Notural  language  underetanding.  Benjamin 
Cummings-  (aLS] 

Allen,  j.  F.  &  Perrault.  C.  R.  (1980)  Analyzing  intention  in  ulteruces. 

Artificial  InteUiftence  15:143-78.  [GH) 

Anderson.  J.  R.  (1983)  The  architecture  of  cognition.  Harvard  University 
Press.  |aLS| 

Baddeley,  A.  (1986)  Working  memory.  Clarendon  Press.  [arLS,  SSl 
Bair,  W. ,  Koch.  C.,  Newsome,  W.,  Britten,  K.  At  Niebur,  £.  (1992)  Power 
^pet'trum  analysis  of  MT  neurons  from  awake  monkey.  Society  for 
Neuroscience  Abstracts  18(1):11.12.  [MPY] 

Bainden,  J.  A.  (1992)  C'onnectioiiism,  generalization  and  propositional 
attitudes:  A  catalogue  of  challenging  issues.  In:  The  symbolic  and 
connectionist  paradigms:  Closing  the  gap.  ed.  J.  Dinsmore. 

Erltuum.  IJABI 

Bamdeii,  j.  A.  At  Srinivas,  K.  (1991)  Encoding  techniques  for  complex 
information  structures  in  connectionist  systems.  Connection  Science 
3(3);263-309.  laLS.  JABJ 

Barnes,  D.  At  Hampson.  P.  J.  (1992)  Stimulus  equivalence,  relational  frame 
themy  and  connectionism:  Implications  for  behaviour  analysis  and 
cognitive  science.  Proceedings  of  the  Fifteenth  Symposium  on 
Quantitative  Analyses  of  Behavior.  Harvard  University  Press.  IPJHJ 
Bartlett,  F.  C.  (1934)  Remernbering  (2nd.  ed.  1967).  Cambridge  University 
Press.  IWJF) 

Bibel,  W.  (1988)  Advanced  topic's  in  automated  deduction.  In:  Fundamentals 
of  artificial  intelligence  ll,  ed.  R.  Nossum.  Springer.  (SH) 

Bienenstock,  E.  (1991)  Notes  on  the  growth  of  a  “composition  machine.  ** 
Presented  at  the  interdisciplinary  Workshop  on  Composicionality  in 
Cognition  and  Neural  Networks,  Abbaye  de  Royaumont,  May.  (aLS) 
Bobrow,  D.  Ac  Collins.  A.,  cxis.  (1975)  Representation  and  understanding 
Academic  Press.  [aLS] 

Bradski,  G..  Carpenter,  G.  A.  Ac  Grossberg.  S.  (1992a)  Working  memory' 

networks  for  learning  temporal  order  with  application  to  3-D  visual  object 
recognition.  Neural  Cornputation  4:270-86.  (SGj 
(1992b)  Working  memories  for  storage  and  recall  of  arbitrary  temporal 
sequences.  Proceedings  of  the  International  Joint  Conferences  on  Neural 
Networks.  Piscataway,  Nj.  [SG] 

Braine,  M.  D.  S.  (1978)  On  the  relationship  between  the  natural  logic  of 
reasoning  and  standard  logic.  Psychological  Review  85:1-21.  (MO) 
Broadbent.  D.  E.  (1%8)  Perception  and  communication.  Pergamon  (MPY) 
Buchanan,  B.  G.  At  ShortlifTe,  E.  F.  (1984)  RuU-based  expert  systetns:  The 
MYCIN  experiments  of  the  Stanford  Heuristic  Programming  Project. 
Addison- Wesley  |aLS  | 

Bylander.  T.  Allemang,  D..  Tanner,  M.  C.  At  Josephson,  J.  R.  (1991)  The 
computational  complexity  of  abduction.  Artificial  Intelligence 
47(l-3):25-60.  laLSl 

Cahill,  A.  Ac  Mitchell.  D.  C.  (1987)  Plans  and  goals  in  story  comprehension. 

In:  Communication  failure  in  dialogue  and  discourse,  ed.  R.  Reilly. 
Elsevier.  IGH] 

Carpenter,  G.  A.  At  Grossberg,  S..  eds.  (1991)  Pattern  recognition  by  self¬ 
organizing  neural  nettcorks.  MIT  Press.  (SGj 
(1992)  A  self-organizing  neural  network  for  supervised  learning,  recognition, 
and  prediction.  /£££  Communications  30:38-49.  (SG] 

Carpenter,  G.  A..  Grossberg,  $.,  Markuzon,  N.,  Reynolds,  J.  H.  At  Rosen. 

D.  B.  (1992)  Fuzzy  ARTMAP:  A  neural  network  architecture  for 
incremental  supervised  learning  of  analog  multidimensional  maps.  IEEE 
Transactions  on  Neural  Networks  3:698-713.  [SG] 

Carpenter.  C.  A.,  Grossberg,  S.  At  Reynolds,  J.  H.  (1991)  ARTMAP: 

Supervised  real-time  learning  and  classification  of  nonstationary  data  by  a 
self-organizing  neural  network.  Neural  Networks  4:565-88.  (SC] 
Carpenter.  P.  A.  At  Just.  M.  A.  (1977)  Reading  comprehension  a.s  eyes  see  it. 
In:  Cognitive  processes  in  comprehension,  ed.  M.  A.  Just  At  P.  A. 
Carpenter.  Eribaum.  (aLSj 

Celebrini,  S..  Thorpe,  S.,  Trotter,  Y.  At  Imlxfrt.  M.  (1993)  Dynamics  of 

orientation  t'txling  in  area  VI  of  the  awake  primate.  Vistwl  Neuroscience 
(in  press).  (SJTj 

Chalmers.  D.  J.  (1990)  Syntactic  transformations  on  distributed 

representations.  Connection  Science  lAt2(2):53-62.  1(»D.  JWG] 

Chamiak.  E.  (1976)  Inference  and  kncjwledge  (I  and  II).  In:  Computational 
semantics,  t*<l.  E.  (Tiarniak  At  Y.  Wilks.  North-HoHund.  (aLSj 
(1983)  Passing  markers:  A  thtH>r\  of  umtextual  infliieiK-e  in  language 
comprehension  rognitiir  .SVienrc  7;  171-90  (aLS.  (ill] 

(.’haier.  N  At  Oaksidrtl.  .M  il^W.))  lyjgicisrn.  mental  models  and  evt'ryday 
reasoning;  Hepls  to  <>arn)iain.  Mind  O  Ijanguage  8  (in  pr<‘ss).  IMOj 
(Unirchland.  P  S  .  foxh.  ('  At  S<*in<Msk!.  T  |  (I9-S9)  What  is  ctnnpntational 


neuroccieiice?  In.  Cemtpsdatumal  neuroscience,  ed.  E.  Schwartz.  MfT 
Press.  IMO] 

Clossman.  C.  (1988)  A  model  of  categorization  and  leaniing  in  a  oonaectiooist 
broadcast  system.  Ph.D.  dissertation,  Department  of  Computer  Science. 
Indiana  University.  [aLS] 

Cohen.  P.  R..  Morgan,  J.  At  Pollack,  M.  E  .  eds.  (1990)  IntCRlitMu  in 
commuiiicafkMi.  MfT  Press.  (GH) 

Collins.  A.  Ac  Michalski.  R.  (1989).  The  logic  of  plausible  reasoning:  A  core 
theory.  Cognitive  Science  13(l):l-50.  (rLS,  DST] 

Cooke.  N.  J.  (1992)  Modeling  human  expertise  in  expert  systems.  In:  The 
psychology  of  esperiise:  Cognitive  research  and  empirical  artificial 
inteUigence,  ed.  R.  R.  Hoffman.  Sponger  [GH] 

Cooper.  P.  R.  (1992)  Structure  recognition  by  connectionist  relaxation  Formal 
analysis.  Computational  Intelligence  8(l);25-44  (PRC] 

Cooper.  P.  R.  At  Swain.  M.  J.  (1992)  Arc  consistency.  Parallelism  and  domain 
dependence.  Artificial  Intelligence  58:207-35.  jPKC) 

Comveau,  J.  (1991)  Time-constrained  memory  for  reader-based  text 
comprehension.  Technical  Report  CSRl-246.  Ph  D.  dissertation. 

Computer  Science  Research  Institute,  University  of  Toronto.  [aLS] 
Cottrell.  G.  (1985)  Parallelism  in  inheritance  hierarchies  with  exceptions. 
Proceedings  of  the  Eighth  International  Joint  Conference  on  Artificial 
Intelligence,  Los  Angeles.  CA.  (CWC] 

(11^9)  A  connectionist  approach  to  word  sense  disambiguation. 

Pitman.  jCWCj 

Creutdeldt,  O..  Ojemann,  G  At  Lettich.  E.  (1989)  Neuronal  activity  in  the 
human  lateral  temporal  lobe.  1.  Responses  to  speech.  Experimtntal  Brain 
Research  77:451-75.  [SJT] 

Crick.  F.  (1984)  Function  of  the  thalamK.*  reticular  complex:  The  searchli|^t 
hypt^hesis.  Proceedings  of  the  National  Academy  of  Sciences  81:4586- 
90.  laLS] 

Crick.  F.  At  Koch,  C.  (1990a)  Towards  a  neurobiological  theory  of 
ctinsc-kiusness.  Seminars  in  Neurosciences  2:263-75.  laLS) 

(I990l>)  Some  reflections  on  visual  awareness.  Ctdd  Spring  Harbor 
Symposium  on  Quantitative  Biology  55:953-62.  |SG] 

Damasio.  A.  K.  (1989)  Time-loched  multiregional  retroactivation:  A  systems- 
level  pro|x>sal  for  the  neural  substrates  of  recall  and  recognition. 

Cognitum  33:25-62.  jaLS] 

Davis.  P.  (1990)  Application  of  optical  chaos  to  temporal  pattern  search  in  a 
nonlinear  optical  resonator.  Japanese  Journal  of  Applied  Physics 
29:LI238-40  ilT] 

Dawson.  M.  R.  W.  At  Schopflocher.  D.  P  (1992)  Autonomous  processing  in 
parallel  distributed  processing  networks  Philosophical  Psychology 
5:199-219.  I.MRWD] 

Dehaene.  S.  At  Changeux,  J-P.  (1991)  The  Wisconsin  card  sorting  test: 

Theoretical  analysis  and  modeling  in  a  neuronal  network  Cerebral  Cortex 
1:62-79.  [MO] 

Diederich,  J.  (1992)  Inkrementelles  Konnektionistisches  Lemen.  Forthcoming 
habilitation  thesis.  Department  of  Computer  Science.  University  of 
Hamburg.  (JD) 

Dietz.  P..  Krizanc,  D.,  Rajasekaran,  S.  At  Shastri.  L.  (1993)  A  lower  bound 
result  for  the  common  element  problem  and  its  implication  for  reflexive 
reasoning.  Technical  Report,  Department  of  Computer  and  Information 
Science.  University  of  Pennsylvania  (forthcoming).  [rLS] 

Dolan.  C.  P.  At  Smolensky,  P.  (1989)  Tensor  product  production  system:  A 
modular  architecture  and  representation.  Connection  Science 
1:53-68  (aLS.  CSH,  RR] 

Dorflher.  G.  At  Rotter,  M.  (1992)  On  the  virtues  of  functional  connectionist 
compositionality.  Proceedings  of  the  Tenth  European  Conference  on 
Artificial  Intelligence,  ed.  B.  Neumann.  Wiley.  [CD] 

Dosher,  B.  A.  Ac  Corbett.  A.  T.  (1982)  Instrument  inferences  and  verb 
schemata.  Memory  and  Cognition  10(6):531-39.  [GH] 

Dougla.s,  R.  J.,  Martin.  K  .A.  C.  At  Whitteridge,  D.  (1991)  An  intracellular 
analysis  of  the  visual  responses  of  neurons  in  cat  visual  cortex.  Journal  of 
PhysUdogy  440:659-96.  (RE] 

Downing.  P.  (1977)  On  the  creation  and  use  of  English  compound  nouns. 
Language  53(4):810-42  |GH] 

Dwork.  C..  Kannelakis,  P.  C.  At  Mitchell.  J.  C.  (1984)  On  the  sequential 
nature  of  unification.  Journal  of  Logic  Programming  1:35-50.  [SH] 

Dyer.  M.  (1983)  In-depth  understanding:  A  computer  mode/  of  inlegroted 
processing  for  tuirrathe  comprehension.  MIT  Press  (aLS) 

Kckhom.  K  (1991)  Stimuiiis-spt'cific  synchronizations  in  the  visual  cortex: 
Linking  of  kKal  features  into  global  figures?  In  .Vewrifiia/  cooperafit  iti/, 
ih1.  J.  Krug<*r.  Springer.  iSfF] 

Eckhorii.  R..  BaiH'r.  R..  Jordan.  W,,  Brosch.  M  .  Kruse,  \\’ ,  Miink,  M.  & 
R«*itlHH*ck.  H.  J  (1988)  (aiherent  osciilatioiis  A  iiuxhaniMn  ol  feature 
linking  in  tiu*  visual  coilex?  Multiple  electrinh*  aiul  correlation  analyses  in 
thecal,  Bio/ogiffl/ (’i//w’rue/i/'.S' (iO,  I2I-,30,  (;tl.,S.  RK.  WJF.  IT,  SJT] 


BEHAVIORAL  AND  BRAIN  SCIENCES  (1993)  16  3 


489 


References/ Shdstri  &  Ajjana^adde:  Association  to  reasoning 


tckhoni.  K..  i«riie»ser,  O. -j. .  KruelWr.  J.,  PeUniU,  k.  &  hjepel.  B-  (lB7b) 
kffK'iency  of  different  neur^  codes.  liifurnMliun  transfer  c'alcuiations  for 
three  different  neural  systems.  Biotogical  CybemeiKs  22:49-60.  |KE) 
Eckhom,  B.  &  Hiepel.  B.  (1975)  Kiicurous  and  extended  application  of 
informatMNi  theory  to  the  afferent  visual  system  of  the  cat  Biolof>icaJ 
Cybernetics  177-17.  {REJ 

Eckhom,  R..  Heilboeck,  H  J.,  Arndt,  M  At  Dk'ke,  P.  (1990)  Feature  linkioft 
via  synehronizatiim  among  distributed  assemidies:  Simulations  of  results 
from  cat  visual  cortex.  Neural  Computatum  2:293-307.  |aLS,  RE) 
Eichenbaum.  H.,  Wiener  S.  1.,  Shapiro,  M.  L.  At  (>>hen,  N.  J.  (1969)  The 
organization  of  spatial  coding  in  the  hippocampus.  A  study  of  neural 
ensemble  activity.  Journo!  of  Neuroscience  9:2764-75.  [<>WS] 

Elman,  J.  (1991)  Distributed  representations,  simple  recurrent  networks,  and 
grammatical  structure.  Machine  Learning  7: 1^^225  (jW'(*| 

Engel,  A-  k  .  koenig.  P. ,  (*ray,  C.  ,M.  Ac  Singer,  W.  (1990)  Stimulus- 
dependent  neuronal  oscillations  in  cat  visual  cxirtex:  Intenolumiiar 
interactions  as  determined  by  cross-correlation  analysis.  European  Journo! 
of  Neuroscience  2  586-606  [WJF.  MPYl 
Engel,  A.  K.,  Koenig,  P..  kreiter,  A.  K.,  Gray.  G.  M.  Ac  Singer.  W.  (1991) 
Tempond  coding  by  cxiherent  oscillations  as  a  potential  solution  to  the 
binding  problem:  Physiological  evidence.  In:  Nonlinear  dynamics  and 
neura!  networks,  ed.  H.  Schuster  Ac  W  Singer.  Weinheim.  |aLS| 
Engel,  A.  k..  Kreiter.  A.  K.  Ac  Singer.  W.  (1992)  Oscillatory  responses  in  th<* 
superior  temporal  sulcus  of  anesthetized  macaque  monkeys.  Society  for 
Neuroscience  Abstracts  16:11.10.  IrLS) 

Etherington.  D.  Ac  Reiter,  K.  (1963)  On  inheritance  hieraix'hies  with 
exceptions.  Proceedings  of  t!te  Nationa!  Conference  on  Artiftcia! 
/ntrl/igence,  Washington.  D.C.  |GWC'| 

Evans,  J.  St.  B.  T.  (1972)  Interpretation  and  matching  bias  in  a  reasoning  task. 
Quarterly  Journal  of  Experinynta!  Psycha!og,y  24.193-99.  (MO) 

(1962)  The  psychoionff  of  deductive  reasoning.  KoutUxlge  Ac  kegaii 
Paul.  I  MO] 

(1963)  Linguistic  determinants  of  bias  in  ctinditional  reasoning.  Quarterly 
Journal  of  Exfterimental  Psychology  35A:635-44.  (MO] 

(1969)  Bias  in  human  reasoning.  Causes  and  conseyuences 
Erltiaum.  (MOj 

Kahlman,  S.  E.  (1979)  SETL.  A  system  for  re;>re.sen/ing  reaTuorld  knoudedge 
MIT  Press  (aUS.  CH.  MO) 

(1981)  Representing  implicit  knowlcxlge.  In:  Parallel  ttuniels  of  assitciatite 
nunnory,  txl.  <i.  E.  Hinton  Ac  J.  A  Anderson.  Erlbaum.  (aLS.  MO| 
Fahiman,  S.  E.,  Hinton,  (i.  E.  Ac  S<*jnowski.  T.  J.  (1963)  Massively  (larallel 
architectures  for  Al:  NETL.  thistle,  and  Boltzmann  machine's. 

Proceedings  of  the  National  Con/erciicc  on  Artificial  Intelligence.  Morgan 
Kaufmann.  (DST| 

Fahiman.  S.  E..  Touretzky.  D.  S.  Ac  van  Hoggen.  W,  (1981)  C^aiU'ellatioii  in  a 
parallel  semantic  network.  Proceedings  of  the  Seventh  Internationa!  Joint 
Conference  on  Artificia!  Intelligence  Morgan  Kaufmann.  |aLS) 

Feldman.  J.  A.  (1982)  Dynamic  cxinmH'tions  in  neural  networks.  Biological 
Cybernetics  46:27-39.  (aLS.  PRC) 

(1965)  Four  frames  sufiicx*:  A  provisional  inodc-l  of  vision  and  spacxv 
Behatioral  and  Brain  Sciences  8  265-89.  (PRt.'j 
(1989)  Neural  representation  of  cxiiKX'ptual  knowKxlge.  lii:  Neural 
connections,  mental  computatum,  exJ  L.  Nadel,  h  A.  (^)0|Xt, 

P.  Gulitxivcr  Ac  R.  M.  Hamish.  MIT  Press.  (aLS) 

Feldman,  j.  A.  Ac  Ballard  D.  H.  (1982)  (-onnwtionist  models  and  their 
properties.  Cognitire  .Science  6(3). :M)5-54.  (aLS.  PR(]I 
Fcxlor.  j.  A.  (1983)  Mttdularity  of  mind.  MIT  Press.  [MO] 

Fodor.  J.  A.  Ac  Pylyshyn.  Z.  W.  (1988a)  (^onnectionism  and  cxignitive 
archit<x.ture:  A  critical  analysis.  In:  Connection.^  and  symitols.  txl. 

S.  Pinker  At  j.  Mehler.  MIT  Press.  |aLS,  DLM| 

(19881))  (^oniHx-tionism  and  txignitive  archittx*ture:  A  critical  analysis. 
Cogni/ion  28.3-71  |(»D.  MOj 

Freeman.  M.  j.  (1975)  Mass  Oi'tum  in  the  nermus  system.  Academic 
Press  (WJFl 

(1981)  A  physiological  h>  j>o(hcsis  of  |>cixx'ption.  Persjtectiies  in  Budogy  and 

Medicine  24(4)  561-92  (aLS) 

(1987)  Simulation  of  cluwitK-  EK(>  patterns  with  a  ds  namic  nuKh'l  of 
olfactory  system  Biologieal  (Ujhernelirs  .5<rl39-.50  |IT| 
fPWi)  Ilic  phvsiologs  »)f  |M-rccptinn  Sriendfir  .Ktiu-ruan 
2M  78-65  |\Vj|-.  IT] 

Frix'inan.  \\’  }  Ac  van  I>i»k.  H.  )J5'i87)  S|)at)al  patlcrns  nl  visual  cortical  la>l 
KK<>  during  condiltoiK’d  rctlcx  in  a  riicsos  inonk<‘s.  Bnitn  lieseareh 
422,267-76  IWjl'l 

l•'r1M  ll,  .A  \1  Ac  .^llcn,  (  !•  (1982)  ktiowK-dgc  rt'lricvai  as  lonitcd  inl<*rcnce 
In  .Votes  tri  mtuputt  r  serenee  Sixth  cinifcrfwc  on  onttmuitrd  iledurtton, 
e«l  D  \\  Uisel.iml  .8priiigct  j.d.Sl 

( -atnli.iin.  .\  i  pi*) 3'  Is  IngK  isJ  coumtivc  sc  tence  jms'-ihU-r'  Mind  i*  lAinguiigc  -8 
I  Ml  pn'ss  '  I  M<  I  i 

■m) 


Gawiie,  T  J  .  Eskandar,  E.  N  .  KR-hinond.  B  j  At  U|>tKan.  L  M  (1991) 
Oscillations  m  the  respunsi-s  of  lu'urcMis  in  inferior  temporal  exHiex  arc 
nut  driven  by  stationary  visual  stimuli  Society  for  Neuroscurnce  Absiracti 
17(1)  160  16  (MPYl 

Geib,  C.  (1990)  A  ccmnectioiiist  mtidel  of  medium-term  memory  Term  rx'port. 
Department  of  Computer  and  Information  Scieme.  University  of 
Pennsylvania  (aLS) 

Geller.  J.  Ac  Du.  (1991)  Parallel  imph*meiitatKHi  of  a  class  reasoner  Jtmmal 
of  Theoretical  Artificial  In/e/Zigencr  3  109-27  (aLS) 

Genesereth,  M  R.  Ac  Nilsson.  N.  |.  (1967)  Logical  foundatums  of  artificial 
intelligence.  Morgan  kaufmann  (aLS) 

Cerstein.  G.  L.  (1970)  Functional  associatkiii  of  neurons  DeUxiton  and 

interpretation  In.  The  neurosciences  Second  study  program,  ed  F  O 
Sc'hmjtl  Rtx-kefeiU'r  University  Press  jaUS) 
fofibs.  H-  W..  Jr.  (1983)  Do  ptxiple  always  protx'ss  tfu'  liti'raJ  meanings  of 
indirtxH  requests?  Journal  of  Experimental  Psychology.  Learning. 

Menu>ry,  and  Cognition  9:524-33  if  HI] 

Gilbert.  C.  D.  At  Wiesel.  T  (1992)  Receptive  field  dynamic's  in  adult  primary 
visual  cortex.  Nature  356:150-52.  (jD) 

Gray.  (7  M.,  Engel,  A.  K  .  kcx*nig,  P.  Ac  Singer.  W.  (1991)  Pniperties  of 
synchronous  osc'illatory  neuronal  interactions  in  c'at  strute  ixirtex.  In. 
Nonlinear  dynamics  and  neura/  netuorks.  cd.  H  Schusic'r  Ac 
W  Singer  Weinheim;  VCH  Publishers.  [aLS.  HE) 

Gray.  C  M..  Koenig.  P..  Engel.  A.  k.  Ac  Singer.  \V,  (1989)  Oscillatory 
responses  in  cat  visual  cxsilex  exhibit  inter-<x>lumnar  synchronization 
which  reflects  global  stimulii.s  propi'rties.  Nature  338:334-37  [aLS,  IT. 
SJT| 

fHay.  C  Ac  Singer.  W.  (1989)  Stimulus-s|Mxxfk-  neural  oscillations  in 

orientation  specific  columns  of  the  visual  cortex.  Proceedings  of  the 
National  Academy  of  Science  (aLS,  RE) 

(>ross,  H  .  Kcx'mc'r.  E  .  Boc'hnu*.  H.  Ac  Pcmii<‘rski.  T  (19921  A  neural  network 
hierarchy  for  data  and  knowUxige  controlUxl  veUxtisx*  visual  attention.  In 
Artificial  neural  nctuork.s.  2,  <xl  I  A)eksand<'r  Ac  J  Tay  lor  Nortb- 
HolUnd  |Ek] 

(Jrossberg.  S  (1976)  Adaptive  {Mltc  rn  classification  and  universal  rixxiding.  11 
KixxJliac'k.  expectation,  olfaction,  and  illnsions.  Biological  Cyln^metics 
23  187-202  iSCi) 

(1978)  A  theory  of  visual  cxKling.  mt'inory.  and  dt'velopmc'iit.  In  f  orma/ 
theories  of  visual  perception,  cd.  E.  la'eiiw<»nlKTg  Ac  H  Knflart. 

Wiley  iSC;) 

td. (1987a)  The  adaptite  hrain.  vols.  1  Ac  2.  Elsevier/Norfh-Holland.  |S<;] 
(19871))  Competitive*  learning;  From  int<*rac  ti\e  activation  to  adaptis'e 
resonamx*.  Cognititje  Science  ll;23-f>3  IMRWD) 

(1988)  Neural  nettvorks  and  natural  irite/Zigence,  MIT  Press.  |S(d 
<Hosslx*rg.  S  Ac  Mingoila.  £.  (1985a)  Neura)  dynamic's  of  form  pc'rcx'{>tion. 
Boundary  completion,  illusory  figures,  and  ni'on  cxilor  spreading. 
Psychological  Review  92:173-211  |S(<) 

(1985b)  Neural  dynamics  of  percx^ptual  grouping:  Textures,  lioundaries.  and 
emergent  segmentations.  Perception  6  Psych<ff)hysics  38:141-71.  (SG| 
(Jrossberg.  S.  Ac  Somers.  D.  (1991)  Synchronized  oscillations  during 
cxx>|H*rative  feature  linking  in  a  cxirtiea!  rnod<4  of  x'l.viial  pt'rcx'ption. 

Neural  Networks  4:453-66.  iSfi) 

(2992)  Synchromztxl  osc'iliations  for  binding  S|>atially  distributed  feature 
codes  into  cxiherent  spatial  patterns.  In:  Neura/  netuxtrks  for  vision  and 
image  processing,  td.  (J.  A.  (.'arpc'nter  Ac  S.  (irosslx*rg.  MIT 
Press  iSf;] 

fJuha,  R.  V.  Ac  Ix'nat,  D.  B  (1990)  f^yc;  A  mid-term  report  Artificial 
/nte/Zigence  Magazine  n(3);32-59.  (aLSi 
Halford.  G.  S..  Wilson,  W.  H.,  (iuo,  J.,  (iayler,  R.  W.,  Wiles,  J.  Ac  Stew'art. 
j.  E.  M.  (1993)  (^onn(*ctionist  implieations  for  prenessing  eapaeity 
limitations  in  analogies.  In:  Advances  in  ronnerttoni.vf  and  neura! 
computational  theory,  vol.  2.  cxi.  K  J.  Holyoak  Ac  J  A.  Baniden 
Ablex.  iCiSH) 

Hanson.  S.  J.  Ac  kegl,  j.  (1987)  PARSNIP:  A  eoniuxlionist  network  that  h'anis 
natural  language*  grammar  from  ex|M>sure  to  natural  language  sentencx's 
PriHeedings  of  the  \inf/i  Annual  C.ognitiic  Seienee  Society  Con/crerire, 
Seattle.  Wa  If;W(;( 

Hamad.  S.  (1990)  Tlie  svinlail  grounding  problem  Phijsica  I)  42. -33.5- 
46  lrI„S| 

Hatfield.  H.  (Fi91l  Hepresent.ition  and  rule  iiistantiation  in  eonmx'tionist 

m*t\vorks.  In  Ccomcrtioui.sm  and  the  phdosophif  of  mind.  <*d  1  Horgaii 
Ac  j  Tic'iison  Kluw<*r  .^c-adeniu-  lail.S) 

Hasc's,  P  j  (!977i  In  (l«4ens<‘  of  logie  Pracccdinc''  of  the  f  ifth  .-Xnnual 
Intcrnatumal  Joint  i  'onfcrcm c  on  .Stitficuil  IntcHigcncc.  ( :aiul)ri<igt'. 

MA  i(;\\(:l 

Hayes.  S  t!  il99I)  .\  rcl.itmual  coiifiol  tiuoix  of  stimuius  *  «jin\aleiu  »•  fu 
liuh’-izoi  ct  nrd  In  liittioi  (  .t>gnilu>n.  ci>ntin'j,4-n<  ics  ijtnf  tnsfnii  tnnud 

amtnd.ft]  1.  I  ll.o.vAP  \  (  I'h-Mom  il’lli; 


h)  I^AVK  )t<Al  AN[)  f^,r-tAlN  SC.lf  (1993)  16  3 


Re/eren£«5/Shastri  &  Ajjanagadde:  Association  to  reasoning 


Hebb,  O-  O.  (1949)  The  organization  of  behaoior.  Wiley.  (aLS.  GWC,  CP} 
Henderson.  J.  (1992)  A  connectionist  parser  for  structure  unification  grammar. 
Froctedinga  of  the  Thirtieth  Annual  Meeting  of  the  Aseociation  of 
Compuiationtd  Linguistics.  Association  Computational 
Linguistics.  larLSj 

Handler.  J.  (1987)  Integrating  marker-passing  and  problem  solving:  A 
spreading  activation  approach  to  improved  choice  in  planning. 

Erlbaum  (aLS,  CH,  MOj 

Hinton.  C.  E.  (1981)  Implementing  semantic  networks  in  parallel  hardware. 

In:  Parallel  models  of  associative  metnory,  ed.  C.  E.  Hinton  &  J.  A. 
Anderson.  Erlbaum.  [aLS] 

(1986)  Learning  distributed  representations  of  concepts.  Proceedings  of  the 
Eighth  Annual  Conference  of  the  Cognitive  Science  Society. 

Erlbaum.  jCD] 

Hirst.  C  (1987)  Semantic  interpretation  and  the  resolution  of  ambiguity. 
Cambridge  University  Pres.s.  jaLS,  CH] 

(1988)  Resolving  lexical  ambiguity  computationally  with  spreading  activation 
and  Polaroid  words.  In:  Lexical  ambiguity  resolution,  ed.  S.  Small, 

C.  Cottrell  it  M.  K.  Tancnhaus.  Morgan  Kaufmann.  (CH) 

Hirst.  C.  &  Chamiak.  E.  (1982)  Word  sense  and  case  slot  disambiguation. 
Proceedings  of  the  Second  National  Conference  on  Artificial  Intelligence, 
Pittsburgh.  [CH| 

Holden.  A.  V.  dc  Kryukov,  V.  I.,  eds.  (1991)  Neurocamputers  and  attention  / 

^  ll.  Proceedings  in  nonlinear  science.  Manchester  University 
Press  (IT) 

Hdlldobler.  S.  (1990)  CHCL:  A  connectionist  inference  system  for  Horn  logic 
based  on  the  connection  method  and  using  limited  resources.  Technical 
Report  90-042,  International  Computer  Science  Institute.  Berkeley, 

CA.  (aLS,  SH] 

Horn,  D.,  Sagi,  D.  &  Usher.  M.  (1991)  Segmentation,  binding,  and  illusory 
conjunctions.  Neural  computation  3(4);5lO-25.  (aLS] 

Hubei,  D.  H.  &  Wicsel.  T.  N.  (1962)  Receptive  fields,  binocular  interaction 
and  functional  architectures  of  the  cat  s  visual  cortex.  Journal  of 
Physiology  160:106-54.  [WJF] 

Hummel.  ].  E.  it  Biederman,  I.  (1992)  Dynamic  binding  in  a  neural  network 
for  shape  recognition.  Psychological  Review  99.4S0-517.  (aLS.  SjT] 
Hummel,  J.  E..  Bums.  B.  fit  Holyoak,  K.  J.  (in  press)  Analogical  mapping  by 
dynamic  binding:  Preliminary  investigations.  In:  Advances  in 
connectionist  and  neural  annputation  theory,  vol.  2:  Analogical 
connections,  ed.  K.  j.  Holyoak  fit  ].  A.  Bamdon.  Ablex.  [CSH,  JEH| 
Hummel.  J.  E.  fit  Holyoak,  K.  J.  (1992)  Indirect  analogical  mapping. 
Proceedings  of  the  Fourteenth  Annual  Conference  of  the  Cognitive 
Science  Society.  Erlbaum.  IjEH] 

Humphreys.  M.  S..  Bain,  J.  D.  fit  Pike,  R.  (1989)  Different  ways  to  cue  a 
coherent  memory'  system:  A  theory'  for  episodic,  semantic  and  prtxx'dural 
tasks.  PjycAo/ogical  Review  96(2):208-33.  (GSH] 

Ikeda.  K.,  Otsuka,  K.  fit  Matsumoto,  K.  (1989)  Maxwell-Bloch  turbulence. 

Progress  of  Theoretical  Physics  (suppl.)99:295-324.  (ITj 
James,  W.  (1890)  Psychology  (Briefer  course).  Holt.  (GWC) 

Johannesma,  P.,  Aertsen,  A.,  Vanden  Boogaard,  H.,  Eggermont,  J.  fit  Epping, 
W.  (1986)  From  synchrony  to  harmony;  Ideas  on  the  function  of  neural 
assemblies  and  on  the  interpretation  of  neural  synchrony.  In;  Brain 
theory,  ed.  G.  Palm  fit  A.  Aertsen.  Springer.  (GPj 
Johnson-Laird,  P  .  N.  (1983)  Mental  models.  Cambridge  University 
Press  [MIB,  MO] 

(1988)  The  computer  and  the  mind.  Harvard  University  Press.  (MIBj 
Johnson-Laird,  P.  N.  fit  Byrne,  R.  M.  J.  (1991)  Deductwn.  Erlbaum.  [MO] 
Just.  M-  A.  fit  Carpenter.  P.  A.,  eds.  (1977)  Cognitive  processes  in 
comprehension.  Erlbaum.  (aLSJ 

Kaneko.  K.  (1989)  Pattern  dynamics  in  spatio-temporal  chaos.  Physica 
34D:1-41.  IIT] 

(1990)  Ou.stering.  .switching,  hi<‘rarchical  ordering  and  control  in  a  network 
of  chiK>tic  elements.  Phy.sico  41 D:  137-72.  [IT] 

Kautz.  H.  A.  fit  Selman,  B.  (1991)  Hard  problems  for  simple  default  logic’s. 

Artificial  /nte/Zigence  47(I-3):243-79.  (aLS] 

Keenan.  J.  M..  Baillet.  S.  D.  fit  Brown,  P  (1984)  Tlie  effects  of  causal 

cohesion  on  t'otiiprehcnsion  and  meinoiy'  j<furnol  of  Verbal  Learning  and 
Verbal  Behoi  ior  23:1 15-26.  |aLS] 

Kintsch.  W. ,  ed.  (1974)  The  representation  of  tueaning  in  meinortj 
F>lljiniin.  |al,Sl 

(U18S'  The  role  of  knowle<lg<*  discourse  conipr<4iensi<in:  tsuistruction- 
intetiration  nxKh'l  P.vycho/of^/rc// Reriew  95: 163-82.  (aL.Sj 
Kiikt,  D  (;  .  Geueiifnitner.  K  K  fit  Movslion,  j.  A  (19‘>1)  I'lie  elVect  of 
4011/  llit  ker  on  tin*  pcneplion  of  global  sfimnins  prop<*i  ti<’S  Society  fitr 
S*un>sri4-tu  e  .\hslract\  17(2'. 479  4  |MPY| 

Klavtii.kii  j  N  Ha  Y  ( 19S7' ( lonlit ination.  dis<-<)niirina(ion.  and  inlormatioii  in 
hs  |>ollicsis  Rci  icn-  94  21 1  -28.  [SS| 

K»MTnc'»  f.  tv  Hoclinu-.  M  <  )i  u.uii/.ihon  ol  an  cpiso»li<'  kixAvIcilge  base 


in  a  neural  network  architocture  with  parallel-sequeotial  processing 
modes  for  autoooinous  reoognitioii  and  learning.  In:  Artifici^  neural 
networks,  ed.  T  Kofaonen.  K.  Mackisara,  O.  Simula  fit  j.  Kangas. 
Elsevier/North-HoUand.  [EK| 

Koerner,  E..  Cross.  H.  fit  Boehme.  H.  (1991)  ElemenUiy  cognitive 

mechanisms  for  knowledge  based  image  interpretatioo.  In:  Proceedings  of 
the  international  Workshop  on  Adaptive  Learnwg  and  Neural  Networks, 
ed.  P.  Bock.  M.  Loew,  F.  J.  Hadermacher  fit  M.  M.  Richter  Ulm: 
Forschungsinstitut  for  Anwendungs  orientierte 
Wtssensrerarbeitung.  1 EK  J 

Koerner.  £.,  Gross.  H.  fit  Tsuda,  1.  (1990)  Holonic  processing  in  a  model 
system  of  cortical  processors.  In:  Budogical  complexity  and  information, 
ed.  H.  Shimizu.  World  Scientific  |EK] 

Koerner.  £.,  Salevski.  H..  Shimizu.  H..  Koerner.  U.  fit  Seifert.  S.  (submitted) 
A  structured  neural  network  model  of  hippocampus  and  its  function  as  a 
nonspecific  controller  of  cortical  decision  making  and  nontrivial 
learning.  (EK] 

Koerner,  E.,  Shimizu,  H.  fit  Tsuda.  1.  (1987)  i^rallel  in  sequence:  Tmvards  the 
architecture  of  an  elementary  cortical  processor.  In:  ParaUel  algorithms 
and  architectures,  ed.  A.  Albrecht,  H.  Hung  fir  G.  Mehlhorn.  Akademie- 
Verlag.  (EK] 

Koerner.  E..  Tsuda,  I.  fit  Shimizu,  H.  (1987)  Take-grant  control,  variable  byte 
formation  and  processing  parallel  in  sequence.  Characteristics  of  a  new 
type  of  holonic  processor.  In:  ParaUel  algorithms  and  architecture,  ed. 

A.  Albrecht,  H.  Jung  fit  C.  Mehlhorn.  Springer.  [IT] 

Kosslyn,  S,  M.,  Murphy,  C.  L.,  Bemesderfer,  M.  E.  fit  Feinstein,  K.  J.  (1977) 
Category  and  continuum  in  mental  comparisons.  Journal  of  Experimental 
Psychology:  General  106:341-75.  [PJH] 

Kreiler.  A.  K..  Engel,  A.  K.  fit  Singer,  W.  (1992)  Stimulus  dependent 
synchronization  in  the  caudal  superior  temporal  sulcus  ol'  macaque 
monkeys.  Society  for  Neuroscience  Abstracts  18:11.11.  irLS] 

Kreiter.  A.  K.  fit  Singer  W.  (1992)  Oscillatory  neuronal  responses  in  the  visual 
cortex  of  the  awake  macaque  monkey.  European  Journal  of  Neuroscience 
4:369-75.  [aLS.  IT] 

Kruse,  W..  Eckhom.  R.,  Schanze,  T.  fit  Reitboeck,  H.  J.  (1992)  Stimulus- 
induced  oscillatory  synchronization  is  inhibited  by  stimulus-locked  non- 
c»cillatory  synchronization  in  cat  visual  cortex:  Two  modes  that  might 
support  feature  linking.  Society  for  Neuroscience  Abstracts 
18:131.3.  IRE] 

Kuramoto.  Y.  (1991)  Collective  synchronization  of  pulse-coupled  oscillators  and 
excitable  units.  Physica  50D:  15-30.  [IT] 

Lado.  F..  Ribary.  U.,  loannides.  A..  Volkman.  j.,  joliot.  M.,  Mogilner,  A.  fit 
Uin^,  R.  (1992)  Coherent  oscillations  in  motor  and  sensory  cortices 
detected  using  MEG  and  MFT  Society  for  Neuroscience  Abstracts 
18:355.15.  IrLS] 

Lakoff,  C.  (1987)  Women,  fire,  and  dangerous  things.  What  categories  reveal 
about  the  mind.  University  of  Chicago  Press.  [aLS.  SS.  DST) 

Lakoff,  G.  fic  Johnson.  M.  (1980)  Metaphors  we  live  by  Uniwrsity  of  Chicago 
Press.  iDST) 

Lange.  T.  E.  fit  Dyer,  M.  G.  (1989)  High-level  inferencing  in  a  connectionist 
network.  Connection  Science  1(2);I81-2I7.  (aLS,  JAB,  GWC] 

Lehnert,  W.  G.  fir  Ringle,  M.  H..  eds.  (1982)  Strategies  for  natural  language 
processing.  Erlbaum.  [aLS] 

Uttvin.  J.  Y..  Maturana,  H.  R..  McCulloch.  W.  S  fit  Pitts.  W.  H.  (1959) 

What  the  frog's  eye  tells  the  frogs  brain.  Proceedings  of  the  institute  of 
Radio  Engineering  47:1940-51.  [WJFJ 
Levesque.  H.  J.  (1988)  Logic  and  the  complexity  of  reasoning.  Journal  of 
Philosophical  Logic  17:335-89.  [aLS] 

Levesque,  H.  J.  fit  Brachmsn,  R.  J.  (1985)  A  fundamental  tradeoff  in 

knowledge  representation  and  reasoning.  In:  Readings  in  knoit^edge 
representation,  ed.  R.  J.  Brachman  fit  H.  J.  Levesciuc.  Morgan 
Kaufmann.  laLS,  GWC] 

Levi,  J.  N.  (1978)  The  syntax  and  semantics  of  complex  twminals.  Academic 
Press.  (GHi 

Livingstone.  M.  S.  (1991)  Visually  evokt^d  oscillations  in  monkey  striate 
cortex.  Society  for  Neuroscience  Abstracts  17:73.3.  (rLSj 
Lucas.  M.  M.,  Tanenbaiis,  M.  K.  fic  (Carlson.  Ci.  N.  (1990)  Levels  of 
representation  in  the  inleqiretalmn  of  anaphoric  reference  and 
instrument  inforenev.  Mntwrif  and  Coguttion  I8(6».61 1-31.  (CH] 

Lym  h,  C.  (1986)  Synap.srs.  circuits,  and  the  beginnings  of  tnenwry.  MIT 
Press.  laLSl 

MaeVicar.  B  t*  Dmlek,  F.  E  (1980)  l)\e-conpling  In  tweeii  C'A3  p>raniidal 
cxdls  in  sikx’s  of  lat  hipiXK-ainpns.  Brain  Bt  scorch  494-97.  [aLS] 
MalltK'h.  M.  1..  Oaksford.  M.  fit  Iddon.  J.  (U)92)  iinpairinents  of  reasoning, 
iiiemors'  aiul  planning  in  early  stage  ParkinNonism  Technical  Hepoti  .V<» 
l347t(.‘.\('-7'R-/3.  Cognitise  NennHompntalion  I  inl.  I  luversitx  ol  \Nales, 
Bangor  jMG] 

MaiHleibaiiin.  H  (IWl)  .A  lobnsl  nuMl«  l  loi  ti-mporal  s\ m Imini/atum  ol 


BFHAVIORAI  ANDBR  N  SClFNCFS  (1993)  163 


491 


Wfyc'rt’nft'.s/Shastri  &  Ajjanaj'adde:  AsstK-iatiun  to  reasoning 


node^:  Ucitcriphun  aitd  siiinulatuHi.  Term  Ke{k>r1,  UepartiiHmt  ut 
Computer  <uid  luforinatum  ScietKx*,  University  of  Fentisylvaiiia.  (aLSj 
Maiidelbaum,  R.  Ac  Shastri,  L.  (1990)  A  robust  model  for  temporal 
synchronisation  of  distant  nudes.  (Unpublished  report.)  [aLSl 
Maui,  D-  K.  Ac  Shastri,  L.  (1991)  Combining  a  connectionist  type  hierarchy 
with  a  connectionist  rule-based  reasoner.  Froceeditif^s  of  the  Thirteenth 
Conference  of  the  C^ni/ii;e  Science  Society.  Erlbaum.  jaLS] 

(1992^  A  connectionist  solution  to  the  multiple  instantiation  problem  using 
teiiitMtfal  syiK'hrony.  Proceedings  of  the  Fourteenth  Conference  of  the 
Cugfit/ice  Scrtence  Society  Erlbaum.  (aLSj 
Marr,  D.  (1971)  Simple  memory:  A  theory  for  archicortex.  Philosophical 
Transactions  of  the  Boyal  Society  B  262:23-81.  [aLS] 

Martin,  (^  E.  Ac  Hiesl)eck,  C.  K.  (1986)  Uniform  parsing  and  inferenciiig  for 
learning.  Proceeditigs  of  the  Fifth  Nation^  Conference  on  Artificial 
Intelligence,  Philadelphia.  (CH| 

Matsumoto,  K.  Ac  Tsuda,  i.  (1987)  Extended  information  in  one-dimensional 
maps.  Physica  26D. 347-57,  [IT] 

McAllestcr,  U.  A.  (1990)  Automatic  recognition  of  tractability  in  infereiu.-e 
relations.  Alenin  1215,  MIT  Artificial  Intelligence  Laboratory.  [aLS| 
McC'arthv,  ).  (1988)  Epistemologkal  challenges  for  connectionism 
(Camimentary  on  Smolensky].  Behavioral  and  Brain  Sciences 
il(l):44.  jaLS] 

McDermott.  D.  (1981)  Artificial  intelligence  meets  natural  stupidity.  In:  Mtnd 
design,  ed.  J.  Haugland.  MIT  Press/ Bradford  Books.  (DLM) 

(1986)  A  critique  of  pure  reason.  Technical  Report,  Department  of 
C^omputer  Science.  Yale  University.  (MOj 
McKendall,  T.  (1991)  A  design  fur  an  answer  extraction  and  display  scheme  hyr 
a  connectionist  rule-bascKl  reasoner.  Unpublishc'd  report  on  work  done  for 
National  Science  Foundation.  Research  Experience  lor  Undergraduates 
grant  IRl  88-0S465.  [aLS] 

.McKixm.  G,  Ar  RatclifT,  R.  (1980)  The  comprehension  processes  and  nwmory 
structures  involvc'd  in  anaphoric  reference.  JourtieU  of  Verl)al  Learning 
and  Verlxil  Behavior  19  668-82.  jaLS) 

(1981)  The  exjinprehension  prcK’ossc's  and  memory  structures  invoIvcKl  in 
instrumental  inference.  Journal  of  Verbal  Learning  and  Verbal  Behavior 
20:671-82.  [aLS] 

(1986)  Inferences  about  predictable  events.  Journal  of  Experimental 
Psychology:  Learning,  Memory,  and  Cognition  12:82-91.  [aLS] 
McMillan,  C.,  Mozer,  M,  Ac  Smolensky,  P.  (1991):  The  connectionist  scientist 
game.  Proceedings  of  the  Thirteenth  Annual  Conference  of  the  Cognitive 
Science  Societi/.  Erlbaum.  [CD] 

McRoy.  S.  W.  (1993)  Alxluctive  interpretation  and  re-interpretation  of  natural 
language  utterances.  Ph.D.  dissertation,  Department  of  Computer 
Science.  University  of  Toronto.  (OH) 

McRo>.  S.  W.  Ac  Hirst,  G.  (1993)  Al>ductive  explanations  of  dialogue 
misunderstanding.  Proceedings,  Sixth  Conference  of  the  European 
Chapter  of  the  Association  for  Computational  Lit\gyistics, 

Utrecht.  [CH] 

Merzenich,  M.  M.,  Recanzone,  C.,  Jenkins,  W.  M.,  Allard,  T.  T  Ac  Nudo.  R. 
J.  (1988)  Cortical  representational  plasticity.  In;  Neurobiology  of 
neocoriex,  ed.  P.  Rakie  Ac  W.  Singer.  Wiley.  [JD] 

Miller,  G.  A.  (1956)  The  magical  number  seven,  plus  or  minus  two:  Some 
limits  on  our  capacity  for  processing  information.  Psychological  Revieu; 
63(2);Sl-97  [aLS.  GSH,  EK] 

Milner.  B.  (1963)  Effects  of  different  brain  lesions  on  card  sorting.  Archives  of 
Neurology  9:90-100.  [MO] 

Minsky,  M.  (1975)  A  framework  for  representing  knowledge.  In:  The 

ps^cho/c7gi/ o/ computer  cision.  ed.  P.  M.  Winston.  McGraw-Hill.  laLSj 
(1985)  The  society  of  mind.  Simon  Ac  Schuster.  [EK] 

Mountcastle,  V.  B.  (1957)  Mcxiality  and  topographic  properties  of  single 
neurons  of  cat  s  somatic  cortex.  Journal  of  Neurophysiology 
20:408-34.  jWJF] 

Mozer.  M.  C..  Zemel,  R.  S.  Ac  Behrman,  M.  (1991)  Learning  to  s<*gment 
images  using  dynamic  feature  binding.  Technical  Report  CU-CS-540-BI , 
University  of  Colorado  at  Boulder.  [aLS] 

Newell.  A.  (1980)  Harpy,  production  systems  and  human  cognition.  In: 
Perception  and  production  of  fluent  speech,  ed.  R.  C^le. 

Erlbaum.  [aLS] 

(1990)  Unified  theories  of  cognition.  Harvard  University  Press.  [arLS] 
Newell.  A.  Ac  Simon.  H.  A,  (1972)  Human  problem  solving.  Prentict*- 
Hall  [aLS] 

Norman.  D.  A.  Ac  Shalliee,  T.  (1985)  Attention  to  action:  Willed  and  automatic 
wntrol  of  behaviour.  In:  Consciousness  and  self-regulation,  vol.  4.  ed. 

R.  J.  Davidson,  (i.  E.  Schwartz  Ac  D-  Shapiro.  Plenum.  |MOj 
Norvig,  P  (1989)  Marker  passing  as  a  weak  method  for  text  inferencing. 

Cogni/it;e  Science  113:569-620.  (aLS,  GH] 

Oaksford,  M  (1993)  Mental  models  and  the  tractability  of  ev<‘ryday  reasoning 
Belun  ioral  and  Brain  Scu’nres  16(2);360-61 .  jMO] 


lidk.sford.  M  &  <3iatt'r,  N  (1991)  .\ganist  (ugKisl  ei>ginlivi-  s*-k-iux'  Mtml  t* 
Language  6  1-38.  [MO] 

(i992a)  Reasoning  thet>ries  and  iKiunded  rationality  In  HattonalUy  K 
Mankteluw  At  D.  Over  Houtledge  |MO] 

(1992b)  Boundi'd  ratioiia)>t>  in  taking  nsks  and  drawing  inferenet-s  Theory 
6  Psychology  2  225- 30  [MO] 

(ill  press)  Cognition  and  inyuiry.  AeadeniK'  Press  [MO] 

OAsford,  M..  Mailoc'h.  M  1  Ac  Swaiii.  S  (1992a)  Transitive  infereiicx*  m 

closed  head  injury:  A  single  cast*  stud>  Tech$ncal  Heporl  So  l^’BCSU- 
TR-12,  Cognitive  Neurocomputation  I'nit,  University  of  Wales. 

Bangor.  [MO] 

Oaksford,  M.,  Malloch,  M  1.,  Watson.  F  Ac  Hargreaves.  1.  (19921)) 

Impairments  of  reasoning,  memory  and  attention  in  frontal  IoIh-  dainag<- 
A  single  ease  study  Technical  Reinnri  So  l’\K'BCSU-TH-l I .  (.'ogiutiy« 
Neuroctiinputation  Unit.  Univi‘rsity  of  Wales.  Bangor  [MO] 

Oaksford,  .M.  At  Steiining.  K  (1992)  Ht'asontng  mth  tonditioiials  eontaining 
negated  constituents.  Journal  of  Ex}H’r\nuntal  Psyvhology  Learmng. 
Memory  O  Cognition  18:835-54.  |MO] 

<  Irani,  M  W  Ac  Perrett.  D  I.  (1992)  Time  course  of  neural  res|Xinse> 
discriminating  different  views  of  the  face  and  head  Jouriuz/  of 
Seurophysiulogy  69.70-84.  ISJT] 

Pabst.  M  ,  Kt'itlMK'ik.  H  J  Ac  Eckhorn.  R  (1989)  .K  model  of  pn  attentiM 

region  di'Pinition  bast'd  on  texture  analysis.  In  Models  i)f  brain  futiclum. 
ed  H.  M.  J.  Cotlerill.  Cambridge  University  Press.  [RE] 

Palin,  G.  (1982)  Seural  assemblies.  An  alternative  approach  to  artificial 
intelligence.  Springer.  [GP] 

(1986)  Associative  networks  and  cx'll  asst'inblies  In  Brant  theory,  ed. 

C.  Palm  Ac  A.  Aertsen  Springer  (CP] 

(1990)  Cell  assemblies  as  a  guideline  for  brain  resean  h  Concepts  in 
Neuroscience  1  133-14.  K^P] 

Pelletier.  F  J.  (1982)  Cajinpletely  non-causal.  completc'lv  lieiiristically  dii\en. 
automatc'd  theorem  pro\ing.  Technical  Reftort  82-7.  Deiurtment  of 
Computing  Sc'iencv,  University  of  Alberta  [MRWD] 

Pft'ifer,  R.  Ac  Verschure.  P.  (1992)  Bi'vond  rationalism.  SvinlKils.  patterns  and 
iM’bavior  Connection  Science  4.313-25  [JD] 

I^illack,  J.  B.  (1988)  Rt*cursi\e  auto-ass<Kiativc’  memory  ;  Devising 

coiiqxisitional  distributed  representations.  Technical  report  .M(.‘rS-88'/24. 
Computing  Research  l^lwratory.  New  Mexico  State  University,  l<»Mi 
(1990)  Recursive  distributed  representations.  Artificial  Intelligence 
46:77-105.  (GD.  CHj 

Posner.  M.  I.  Ac  Snyder,  C.  R.  R.  (1975)  Attention  and  cognitive  cxmtrol  In 
Information  processing  and  cognition.  The  Loyola  Symfwsium,  I'd.  H,  1. 
Solso.  Erlbaum.  |aLS| 

Potts,  (i.  R..  Kt'cnan.  ).  M.  Ac  Golding,  J.  M.  (1988)  Assessing  the  oecurrence 
of  elal)orative  inferences.  Lexical  decision  versus  naming.  Journal  of 
Mernory  and  Language  27:399-415.  [aLS] 

(^uiilian.  M.  R.  (1968)  Semantic  mc'mory.  In;  Senuinlic  in/onnafion 
processing,  ed.  M.  Minsky  MIT  Press.  jaLS] 

Ramesh,  R.,  \enna,  R.  M..  Krishnaprasad.  T.  Ac  Ramakrishnan,  1  V.  (1989; 
Term  matching  on  parallel  computers.  Journal  of  Logic  Programming 
6:213-28.  jSHj 

Rt'dcr,  L.  M.  At  Ross,  B.  H.  (1983)  Integrated  knowledge  in  different  tasks. 
The  role  of  retrieval  strategj  on  fan  effects.  Jourruil  of  Experimental 
Psychology:  Learning.  Memory,  and  Cognition  9:55-72.  [aLS] 

Rc'itboeck,  H.  J.,  Eckhom.  R..  Arndt,  M.  Ac  Dicke,  P.  (1990)  A  mtxiel  for 
feature  linking  via  correlated  neural  activity.  In;  Synergetics  of  cognition. 
Springer  series  in  synergetics,  vol.  45,  ed.  H.  Haken  Ac  M.  Stadler. 
Springer.  [IT] 

Rc'itcr.  R.  (1980)  A  logic  for  default  reasoning.  Artificial  Intelligence 
13:81-132.  [rLS] 

Hiesbt'ck,  C,  R.  Ac  Schank,  R.  C.  (1989)  Inside  case-based  reasoning. 

Erlbaum.  [PRC] 

Rips,  L.  J.  (1983)  Cognitive  processes  in  propositional  rea.soning.  Psychological 
fleeieu  90:38-71.  [MO] 

Rohwer,  R.  A.  (1993)  A  representation  of  representation  applitHl  to  a 

discussion  of  v'ariable  binding.  In.  Seurodynamics  and  psychoL>gy.  <'d  .M. 
Oaksford  Ac  G.  Brown.  Academic  Press.  [RR] 

Roll.s.  E.  T.  (1991)  Neural  organisation  of  higher  visual  functions.  Current 
Gpinion  in  Neurobiology  1:274-78.  [aLS,  MPY] 

Rotter.  M.  Ac  Dorffher.  (J.  (1990)  Struktur  iind  Konzeptrelatioiu'ii  in  vert<  ilt«  n 
Netzwerken.  In:  Konnektionismus  in  Artificial  Intelligence  und 
Kognitionsforschung.  I'd,  G.  Dorffner.  Springer.  KiD) 

Rumeihaii.  D.  E.  (1989)  Toward  a  micTOstructural  acx'ount  of  human 

reasoning.  In:  Similarity  and  analogical  reasoning.  ihI  S  Xdsnaulou 
Ac  A.  Ortony.  Cambridge  Universtiy  Press  [MO] 

KiimeUiart,  D.  E.  At  MtiMellan,  j.  L. .  eds.  (1986)  Parallel  disirihitteil 
processing:  Explorations  in  the  micro.structure  of  cognition,  vol  1 
Bradford  B(K)ks/MIT  Press  laLS] 


492 


BEHAVIORAL  AND  BRAIN  SCIENCES  (1993)  16  3 


ReferencesJShaLStn  (t  AJjanagadde:  Association  to  reasoning 


Rumelhart.  D.  £.,  Smolensky,  F.  McCMUnd,  J.  L.  &  Hinton.  C.  E.  (1986) 
Schemata  and  sequential  thought  processes  in  PDF  models,  in.  PeroUel 
dittributed  procesnng.  Explorationt  in  the  microstructure  of  cognition , 
voi.  2.  Psychological  and  biological  processes,  ed  J.  L.  McClelland 
&  D.  E  Rumelhart  MIT  Press.  |MO) 

Schank,  R.  C.  &  Abelson,  R.  P.  (1977)  Scripts,  p^ns,  goals  and 
understanding.  Erlbaum.  laLS) 

Schanze.  T  &  Eckhom,  R.  (1991)  Synchronization  statistics  of  stimulus- 

specific  oscillatory  events  in  cat  visual  cortes.  In;  Symipse.  transmission, 
modulation,  ed.  N.  Eisner  At  H.  Penzlin.  Thieme  Verlag.  [REl 
Schneider.  W.  Ac  Shiffrin,  R.  M.  (1977)  Controlled  and  automatic  human 

information  processing.  I.  Detection,  search,  and  attention.  Psychological 
Revietc  8*4;  1-66.  jaLSl 

Schubert.  L.  K.  (1989)  An  episodic  kiiowic'dgc  representation  for  narrative 
texts.  PuKeedings  of  the  First  International  Conference  on  Principles  of 
Knowledge  Representation  and  Reasoning  Morgan  Kaufmann  (aLS) 
Sejnosvski,  T  J.  (1981)  Skeleton  filters  in  the  brain.  In:  Parallel  models 
of  associative  memory,  ed.  C.  E.  Hinton  Ac  J.  A.  Anderson 
Erlbaum.  laLS| 

Servan-Schreil)er,  D  .  CleiTemans,  A.  Ac  Mcf>lelland.  J  (1989)  Encoding 
semantical  structure  in  simple  ri'current  nets.  In.  Adtances  tn  neural 
information  processing  systems  1.  ed.  D.  Touretzsky.  Morgan 
Kaufmann.  [jVVG] 

Shallice.  T  (1982)  Specific  impairments  of  planning.  Philosophical 

Transactions  of  the  Royal  Society  of  London  B  298: 199-299  [MO] 
Sharkey.  N.  E.  (1992)  The  causal  role  of  the  constituents  of  superpositional 
representations.  In.  Cyl}emetics  and  systems  '92.  ed.  R.  Trappl.  World 
Scientific.  (CD) 

Shastri,  L.  (1988a)  Scnuintic  networks:  An  evidential  formulatUtn  and  Us 
connectionist  realization.  Pitman/Morgan  Kaufmann  farLS,  PRC] 
(1988b)  A  connectionist  approach  to  knowledge  representation  and  limited 
inference.  Cognifice  Science  12(31:331-92  (arLS,  CWC] 

(1990)  Connectionism  and  the  computational  effectiveness  of  reasoning 
Theoretical  Linguistics  I6(I):65-87.  [aLSi 
(1991)  Relevance  of  connectionism  to  Al:  A  representation  and  reasoning 
perspective  In:  Advances  in  connectionist  and  neural  computation 
theory,  vol  1.  ed.  J.  Barnden  Ac  J.  Pollack.  Ablex.  (aLS] 

(1992)  Encoding  higher-order  bindings  in  LCS  structures.  Working  notes  for 
the  NL(^-Project.  National  Science  Foundation.  (rLS] 

(1993a)  A  realization  of  preference  rules  using  temporal  synchrony  (in 
preparation).  iaLSj 

Shastri,  L.  (1993b)  Learning  evidential  rules  in  SHRUTI  (in 
preparation).  (rLS] 

Shastri.  L.  Ac  AJjanagadde.  V.  G.  (1990)  A  cennectionist  representation  of 
rules,  variables  and  dynamic  binding.  Technical  Report  MS -CIS ■90-05. 
Department  of  Computer  and  Information  Science.  University  of 
Pennsylvania.  [aLS] 

Shastri.  L.  Ac  Feldman.  J.  A.  (1986)  Semantic  nets,  neural  nets,  and  routines. 
In:  Advances  tn  cognitive  science,  ed.  N.  Sharkey.  Ellis 
Harwcxxl/ Wiley.  [arLS] 

Shiffrin.  R.  M.  Ac  Schneider,  W.  (1977)  Controlled  and  automatic  human 

information  processing:  II.  Perceptual  learning,  automatic  attending,  and 
a  general  theory.  Psychological  Review  84  127-90.  (aLSl 
Shimizu,  H.  Ac  Yamaguchi,  Y.  (1987)  Synergetic  computers  and  hofonics- 
information  dynamics  of  semantic  computers.  Physica  Scripta  36:970- 
85.  [IT] 

Shimizu,  H..  Yamaguchi,  Y.,  Tsiida.  1.  Ac  Yano,  M.  (1985)  Pattern  recognition 
based  on  holonic  information  dynamics.  In;  Complex  systems-operational 
approaches,  ed.  H.  Haken.  Springer  Series  in  Synergetics,  vol  31.  |EKj 
Singer,  M.  Ac  Ferreira.  F.  (1983)  Inferring  consequences  in  slor>  comprehen¬ 
sion.  Journal  of  Verbal  Learning  and  Verbal  Behavior  22.437-48.  (aLS) 
Singer.  W.  (1987)  Activity-dependent  self-organization  of  synaptic  connections 
as  a  substrate  of  learning.  In:  The  neural  and  nwlecular  bases  of  learning. 
ed.  J  -P.  Changeux  At  M.  Konishi.  Wiley.  [JD] 

Skarda,  (7  A  At  Freeman.  W,  J  (1987)  How  brains  make  chaos  in  order  to 
make  sense  of  the  world.  Behavioral  and  Brain  ScieTice.s  10.161- 
9.5.  [WJF.  ITI 

Sloman,  S  A  (1993)  Feature-based  induction  Cognitive  Psychology  25  tin 
press).  (SS] 

Smolensky.  P  (1988)  On  the  proiH*r  treatment  of  connectionism  Bchatioral 
and  Brain  Sciences  11:1-74.  (CD.  EKl 
(1990)  Tensor  product  variable  binding  and  the  repri'sentation  <if  s\iiiIm>Iic 
structure  in  C'onneetioiiist  systems  Artificial  IntrUignuc  4<i(l-2)  159- 
216  laLS.  (;SH1 

Squire,  L  R  dyHT)  M^nuiry  und  brain  Oxforil  Universits  Fri'ss  IaLSj 
Stjuire.  1.  H  At  Zola-Morgan.  S  (1991)  The  medial  t<’m|>oral  kibe  im*mnr\ 
systimi  .Seierire  25.3  1.380-86.  [aLS] 

Slimning.  K  .  Slu'pard,  M  At  la'vv.  J  ( 1988)  On  th<‘ etmsfriu  tion  of  r«*prrM*n- 


t^ions  for  iiKlividuals  from  descriptioiis  in  text  Language  and  Cognittve 
Proceise*  3(2):  129-64  jaLS) 

Stevens.  C.  F.  (1989)  How  cortical  interconnectedness  vanes  with  network 
size.  Neural  Computation  1:473-79.  IJD] 

Stolcke.  A.  1C.  Ac  Wu.  D  (19^)  Tree  matching  with  recursive  distributed  rep¬ 
resentations.  AAAI-92  Workshop  on  Integrating  Neural  and  Symbola  Pro¬ 
cesses.  San  lose.  CA  (Also  available  as  Technical  Report  92-025. 
International  Computer  Science  Institute.  Berkeley  )  |CH] 

Strehler.  B  L  Ac  Lestienne,  K.  (1986)  Evidence  on  precise  time-codc'd  s>m- 
buls  and  memory  of  patterns  in  monkey  cortical  neuronal  spike  trams 
Proceedings  of  the  National  Academy  of  Science  83.9812-16  (aLS] 
Strong.  W  Ac  Whitehead.  B  A  (1989)  A  solution  to  the  tag-assignmeiil 
problem  for  neural  nets  Behavioral  and  Brain  Sciences  12  381- 
433  laLS,  GWS] 

Sumida.  R  A  Ac  Dyer.  M  (1989)  Storing  and  generalizing  multipU-  in¬ 
stances  while  maintaining  knowledge-level  parallelism  Proceedings  of  ih* 
Eleventh  /nternotiorui/  Joint  Conference  on  Artificial  Intelligence  Morgan 
Kauftnann  I  aLS] 

Thiirpe,  S  J  .  Celebrini.  S..  Trotter,  Y.  Ac  Imbert,  VI  (1991)  Dsnainics  iil 
sterc**)  processing  in  area  VT  of  the  awake  primate  European  Journal  of 
•Veirroscience  (SuppI  )4  83  ISJTJ 

Thorpe.  S  J  .  Celebrini.  S  .  Trotter.  Y. .  Pouget.  A.  At  Imliert.  M.  (1989)  Dv 
namic  aspects  of  orientation  cxHling  in  area  VT  of  the  awake  primate  Eu¬ 
ropean  Journal  o/.Veijryscience  (SuppI  )  2.322  [SJTj 
Thorpe.  S.  J.  At  Imbert.  VI  (1989)  Biological  constraints  on  conncvtiomst 
oiodc'ls.  In  Connecfioni^ni  in  perspective,  ed.  R.  Pfeiffer.  Z  Schreter 
F-  Fogelman-Souile  At  L.  Stet‘ls  Elst‘vier.  [arLS.  SjT] 

Tomabechi.  H  At  Kitano.  H.  (1989)  Beyond  PDP  The  frequency  modulation 
neural  network  approach  Proceedings  of  the  Eleventh  International  Joint 
Conference  on  Artificial  Intelligence  Morgan  kaufmaim  [aLS] 
Touretzky.  D.  S  (1986)  The  mathematics  of  inbenftjnce  systems  Vlorgan 
Kaufmann/  Pitman .  |  aLS  ] 

(1990)  BoltzCONS.  Dynamic  syinliol  structures  in  a  connivtiomst  network 
Artificial  Intelligence  46  1-2.  5-46  |rLS.  JAB] 

TourcTzky.  D.  S  At  Hinton.  G.  E.  (1988)  A  distributed  coimt*ctionisl  prtKluc* 
lion  system.  Cognitive  Science  12(3). 423-66,  jaLS.  GWC^J 
Tovee.  M.  J.  At  Rolls.  E.  T  (1992)  Oscillatory  activity  is  not  evident  in  the  pri¬ 
mate  temjsoral  visual  cortex  with  static  stimuli.  Sruroreport  3.369- 
71.  (aLS.  SJT.  MPYj 

Toyama.  K..  Kimura.  M.  At  Tanaka,  T.  (1981)  Cross  correlation  analysis  of  in- 
terneuronal  connectivity  in  cat  s  isual  cortex.  Journal  of  Neurophysiology 
46(2):191-20L  IaLSj 

Treisman.  A.  At  Celade,  G.  (1980)  A  feature  integration  theory  of  attention 
Cogfiiciw  Psychology  12:97-136.  [aLS] 

Tsuda.  1.  (199!)  Chaotic  itinerancy  as  a  dynamical  basis  of  hermeneutics  m 
brain  and  mind.  World  Futures  32:167-84.  IWjF.  IT] 

(1992)  Dynamic  link  of  memory-chaotic  memory  map  in  nonequiiihriurn 
neural  netssorks.  Neural  Networks  5:313-26  [IT] 

Tulving.  E.  (1983)  Elements  of  episodic  memory  Oxford  University 
Press.  [aLS] 

Tversky.  A  Ac  kahneman.  D  (1983)  Extensiona)  versus  intuitive  reasoning 
The  conjunction  fallacy  in  proliability  judgment.  Psychological  Review 
90:293-315.  |SSj 

Ullman.  J.  D.  Ac  van  Gelder,  A.  (1988)  Parallel  complexity  of  logical  query 
programs.  Algorithmica  3:5-42.  jaLS) 

Valiant.  L.  G.  (1988)  Functionality  in  neural  nets.  Proceedings  of  the 
Conference  on  Artificial  Intelligence,  Saint  Paul,  MN.  (GWCj 
van  (Redder.  T  (1990)  Compositionalitv:  A  connectionist  variation  on  a  c  lassical 
theme.  Cognitiic  Science  14:208-12  IGD] 

(1991)  Classical  questions,  radical  answers:  Connectionism  and  the  structure 
of  mental  representations.  In.  Connectionism  and  the  philosophy  of  mind, 
<*d  T  Horgan  Ac  J.  Tienson,  kluwer.  IJWG] 

Velmans.  M.  (1991)  Is  human  information  prcKcssing  conscious'^  [and  C^oni- 
mentary  thereon)  Behavioral  and  Brain  Sciences  14(4):651-726  |(T1] 

Vogels.  R.  Ac  Orban.  G.  A.  (1991)  Quantitative  study  of  striate  single  unit  re  ¬ 
sponses  in  monkc'vs  |H'rforming  an  orientation  discTimination  task  Pi/H'ri- 
mcntal  Brain  Research  84:1-11.  |SJT| 
von  d«*r  .Vlalsburg,  (^  (1981)  The  C'orrelation  theory  of  brain  Itimtion  hitcnuil 
Rcfiort  Hl-2.  Department  of  Neurobiology.  Max-Planck-Institutc*  for  Bio 
pliysiea)  (Tiemistry.  Geittingen.  Germany  jaLS.  WJFi 
(1986)  Am  I  thinking  assemliiies?  In.  Brain  theory,  ed  (•  Palm 
Ac  A  .A<*rtsen  Springer.  (aLS,  (»P| 

vou  d<*r  Malsbiirg.  G.  Ac  SehneMder,  W  (1986)  A  neural  eenktail-p.irts  pro 
eesMir  Biological  Cybernetics  54  29-40  (aLS,  WJF.  F.k.  \J\ 

Wason.  P,  G  tl960’  On  tlie  failure"  t«i  eJimmate*  h\ jxithe-se's  in  a  enmi'piual 
task  k^uartcrly  JourudI  of  Experimental  Psychology  12  129-40  [SSj 
I  I9(i(i)  Ke'asoiung  In  .Ve’u'  horizons  m  psychology,  e-el  H  Foss 
Pemgiim  [MOl 


nrHAVlQRAl  AND  BRAIN  SCIFNCFS  (1993)  16  3 


49.? 


fle/erewt<'s/Shastri  &  Ajjaiianadde;  Association  to  reasoning 


Whitney.  P.  &  Willtain^* Whitney.  D  (lyyO)  Toward  a  c'cntentuahst  view  of 
eUborative  inrereiic-es.  In:  The  ;>.vycha/agy  of  learning  and  nwtiuition. 
vol.  25.  ed-  A-  C  Craesser  &  G,  H.  Bower  Academic  Press.  [GHj 
WiekelKfen.  W.  A.  (1979)  GhunkinK  and  consolidation:  A  theoreta-al  synthesis 
semantic  networks,  eonfijitnrinx  >n  conditioniiij;,  $-B  versus  co};i)itive 
leaminK.  normal  lorj^ettinK.  tlie  amnesic  syndrome,  and  the  hippocain|)ai 
aniusal  system.  Psychological  Hevieu:  h6(l):44-60.  (aLSj 
Wileiisky,  K.  (19H3)  Planning  and  understanditig.  A  computational  approach  to 
hutnan  reasoning  Addison-Wesley.  |aLS] 

Wu.  D.  (1989)  A  proliahilistic  approath  to  marker  propa^tion.  Proceedings  of 
the  Eleventh  /nternafioiki/  Joint  Conference  on  Arti^cW  Intelligence. 
Mori^an  Kantmann.  KUi] 


(1992a)  Autuniatic  inference.  A  proliabilisitk'  basis  tor  natural  lauKua^e  niter- 
|sretaCioii  Ph  L)  dissc^rtation  {Technicat  Heport  L'CB/CSD  92/692*.  Div  i¬ 
sion  of  Computer  ScieiHe.  I'niversity  Calilonna  at  Berkeley  K«Hl 
(1992l>)  Approximate  maxunuin-entro(>y  intef(ratioii  of  syntactic  and  seiiian- 
tic  constraints.  AAAl-92  Worksliop  on  Statistically-Based  NLP  Tt'cL- 
niques.  San  Jose.  CA.  [GH] 

Y-.  Freeman,  W.  J.,  Burke,  B.  6t  Yang,  Q.  (1991)  PatU'ni  recotontion  bv  a 
distributed  neural  network:  An  industrial  apfilicatioii  Seurat  Setuorks 
4:103-12  (WJF) 

Ybunit,  M.  P..  lanaka.  K.  At  Yamane,  S.  (1991)  On  oscillating  neuronal  re¬ 
sponses  in  monkey  visual  cortex.  Society  for  .Veuroscierure  Abstracts 
I7(l):73.9.  IMPYI 


494 


HAVI(.)f^AI  A\;'  r^HAlN  'M'M'NCfS  (199;))  103 


Contieciion  Science,  I  'ol.  5,  Xos.  3  4,  1993 


205 


Reflexive  Reasoning  with  Multiple  Instantiation 
in  a  Connectionist  Reasoning  System  with  a 
Type  Hierarchy 


D.  R.  MANI  &  LOKHNORA  SHASTRI 


We  describe  a  hybrid  knowledge  representation  and  reasoning  system  that  integrates  a 
rule -based  reasoner  with  a  type  hierarchy  and  can  accommodate  multiple  dynamic 
instantiations  of  predicates.  The  system — which  is  an  extension  of  the  reasoner 
described  in  Shastrt  and  Ajjanagadde  (1990) — maintains  and  propagates  variable 
bindings  using  temporally  synchronous  (i.c.  tn-phase  i  firing  of  appropriate  nodes,  and 
can  perform  a  broad  class  of  reasoning  with  extreme  efficiency.  The  type  hierarchy 
allows  the  system  to  encode  generic  facts  such  as  'cats  prey  on  birds'  and  rules  such  as 
'if  X  preys  on  y  then  y  is  scared  of  x’  and  use  them  to  infer  that  Tweety  the  canary 
is  scared  of  Sylvester  the  cat.  The  system  can  also  encode  qualified  rules  such  as  'if  an 
animate  agent  collides  with  a  solid  t)b)eci  then  the  agent  gets  hurt'.  The  ability  to 
accommodate  multiple  dynamic  instantiations  of  any  predicate  allows  the  system  to 
handle  a  much  broader  class  of  inferences,  including  those  involving  transitivity  and 
bounded  recursion.  The  proposed  system  can  answer  queries  in  time  which  is  indepen¬ 
dent  of  the  size  of  the  knowledge  base,  and  is  only  proportional  to  the  length  of  the 
shortest  derivation  of  the  query. 

KHYWOROs:  Binding  problem,  connectionism,  knowledge  representation,  multi¬ 
ple  instantiation,  reflexive  reasoning,  type  hierarchy. 

1.  Introduction 

Connectionist  networks,  or  neural  networks,  have  primarily  been  used  to  model 
Mow-level’  cognitive  phenomena  including  visual  pattern  recognition,  speech 
processing  and  effector  control.'  For  these  tasks,  connectionist  networks  offer 
several  advantages:  massive  parallelism,  noise-  and  fault-tolerance,  graceful 
degradation  and  trainability.  On  the  other  hand,  classical  artificial  intelligence 
'  AD  or  symbol-processing  systems  have  primarily  focused  on  ‘high-level’  pro¬ 
cesses  involving  ‘reasoning’  using  rules  and  manipulating  abstract  knowledge. 
Though  symbol-processing  systems  are  capable  of  knowledge  representation  and 
rule-based  reasoning,  they  have  had  the  disadvantage  of  not  being  scalable — these 
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systems  become  slow  and  unusable  as  the  size  ot  the  knowledge  base  increases. 

This  is  especially  true  when  modeling  human  cognition.  Hybrid  architectures  arc 
an  integration  ot  the  notions  of  neural  and  symbolic  processing  systems  in  an 
attempt  to  get  the  best  of  both  worlds. 

We  describe  a  knowledge  representation  and  reasoning  system  which  com¬ 
bines  notions  from  classical  AI  and  connectionism.  The  motivation  for  developing 
such  a  reasoning  system,  however,  is  not  to  simulate’  classical  symbol-processing 
using  connectionism.  Instead,  the  effort  is  motivated  by  the  belief  that  an 
integration  of  neural  and  symbolic  systems  would  result  in  models  that  retain  the 
essential  representational  and  inferential  powers  of  classical  systems,  and  at  the 
same  time  constrain  and  limit  the  reasoning  system  in  cognitively  plausible  ways. 
Furthermore,  the  simplicity,  efficiency  and  massive  parallelism  of  connectionist 
models  combined  with  the  symbol-processing  capabilities  of  classical  AI  systems 
would  lead  to  an  efficient  and  scalable  reasoning  system. 

A  crucial  factor  in  the  failure  of  classical  symbol-processing  systems  at  modeling 
human  cognition  is  their  use  of  general-purpose  paradigms.  From  a  reasoning 
perspective,  the  use  of  computationally  intractable  techniques  like  general-purpose 
theorem  proving  will  not  lead  to  efficient,  rapid  and  tractable  reasoning  systems. 
To  make  reasoning  tractable,  constraints  and  restrictions  will  have  to  be  imposed 
.n  order  to  limit  the  reasoning  capability  of  the  system  in  several  ways.  An  ad  hoc 
choice  of  these  constraints  will  result  in  a  system  which  is  tractable,  but  probably 
not  very  useful.  By  using  connectionism  as  our  underlying  paradigm,  and  by  turning 
to  cognitive  science,  psychology  and  neuroscience  for  a  realistic  set  of  constraints, 
we  hope  to  develop  a  model  of  tractable  reasoning  that  not  only  retains  the  essential 
representational  and  inferential  powers  of  classical  systems,  but  also  limits  the 
resulting  system  in  cognitively  plausible  ways. 

In  developing  a  connectionist  knowledge  representation  and  reasoning  system, 
one  of  the  key  issues  that  needs  to  be  addressed  is  the  dynamic  variable  binding 
problem  (Feldman,  1982;  van  der  Malsburg,  1986),  Shastri  and  Ajjanagadde 
(1993a,  1990;  Ajjanagadde  &  Shastri,  1991)  have  described  a  solution  to  the 
variable  binding  problem  and  shown  that  the  solution  leads  to  the  design  of  a 
connectionist  reasoning  system  that  can  represent  systematic  knowledge  involving 
n-ary  predicates  and  variables,  and  perform  a  broad  class  of  reasoning  with 
extreme  efficiency.  The  system  can  store  both  specific  situations  i  facts)  and 
general  systematic  relationships  in  the  domain  rules).  The  time  taken  by  the 
reasoning  system  to  draw  an  inference  is  only  proportional  to  the  length  of  the 
chain  of  inference,  and  is  independent  of  the  number  of  rules  and  facts  encoded  by 
the  system.  The  reasoning  system  maintains  and  propagates  variable  bindings 
using  temporally  synchronous — i.e.  in-phase — firing  of  appropriate  nodes.  The 
solution  to  the  variable  binding  problem  allows  the  system  to  maintain  and 
propagate  a  large  number  of  bindings  simultaneously  as  long  as  the  number  of 
distinct  entities  participating  in  the  bindings  during  any  given  episode  of  reason¬ 
ing  remains  bounded.  Reasoning  in  the  proposed  system  is  the  transient  but 
systematic  flow  of  rhythmic  patterns  of  activation,  where  each  phase  in  the 
rhythmic  pattern  corresponds  to  a  distinct  entity  involved  in  the  reasoning 
process,  and  where  variable  bindings  arc  represented  as  the  synchronous  firing  of 
appropriate  argument  and  entity  (filler)  nodes.  A  fact  behaves  as  a  temporal 
pattern  matcher  that  becomes  ‘active’  when  it  detects  that  the  bindings  corre¬ 
sponding  to  it  are  present  in  the  system’s  pattern  of  activity.  Rules  are  intercon¬ 
nection  patterns  that  propagate  and  transform  rhythmic  patterns  of  activity  across 
relational  structures. 
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I'hc  system  attempts  to  model  elHcicnt,  ellortless  and  spontaneous  reasonmtz 
over  a  large  body  ot  knowledge.  Such  reasoning  has  been  described  as  reflexive 
reasoning  ( Shastri.  1990).  The  system  as  described  in  Shastri  and  Ajjanagadde 
vl990),  however,  has  some  limitations,  in  this  paper,  we  overcome  some  of  these 
limitations  by  extending  the  basic  system  so  that  it  can  etlectively  draw  a  wider 
class  of  inferences.  In  particular,  we  describe  how  i  the  rule-based  component 
can  be  interfaced  with  a  type  hierarchy  and  iii  the  inferences  drawn  by  the 
reasoning  system  may  involve  multiple  dynamic  instantiatiiins  of  the  same  predi¬ 
cate  ( Barnden  &  Bollack.  1991;  Dyer.  1991).  The  extended  system  continues  to 
be  scalable  and  can  reason  rapidly  with  large  knowledge  bases.  The  work  also 
leads  to  several  predictions  which  have  cognitive  and  psychological  implications, 
and  offer  fresh  insight  into  the  nature  of  reflexive  reasoning  i  Sections  5.6  and 
5.7). 

Several  other  researchers  have  proposed  connectionist  knowledge  representa¬ 
tion  and  reasoning  systems  using  a  variety  of  techniques  Section  6).  These 
include  the  use  of  dynamic  connections  :  T'eldman,  1982  .  parallel  constraint 
satisfaction  (Touret/.ky  &  Hinton,  1988),  position  specific  encoding  (Barnden  & 
Srinivas,  1991.  tensor  product  representations  i  Dolan  &  Smolensky,  1989)  and 
signatures  (Lange  &  Dyer,  1989). 

/./.  The  i\ecd  for  a  Type  Hierarchy 

Human  agents  can  reason  with  types,  categories  or  concepts  as  effectively  as  they 
can  reason  with  instances  or  individuals.  If  we  know  that  Sylvester  is  a  cat  and 
Tweety  is  a  canary,  using  the  knowledge  that  ‘cats  prey  on  birds',  we  can 
spontaneously  infer  that  'Sylvester  preys  on  Tweety'.  Such  inferences  may  be 
performed  efficiently  by  organizing  concepts  into  a  type  hierarchy  so  that  we  can 
quickly  traverse  the  hierarchy  to  find  out  facts  like  canaries  are  birds',  'birds  and 
cats  are  animals',  and  so  on.  Though  these  inferences  can  be  drawn  without  the 
use  of  a  type  hierarchy  by  encoding  the  type  information  directly  in  the  rule 
base),  using  a  separate  type  hierarchy  substantially  improves  reasoning  efficiency, 
especially  since  the  inferences  drawn  in  the  type  hierarchy  are  used  repeatedly  in 
rertexive  reasoning  (also  sec  Section  5). 

Interaction  between  the  type  hierarchy  and  the  rule  base  further  facilitates 
inferences  that  the  reasoning  system  can  draw.  Continuing  with  the  Tweety- 
Sylvester  example,  if  we  knew  the  rule  ‘if  x  preys  on  y  then  y  is  scared  of  x'  we 
can  infer,  using  the  type  hierarchy,  that  'Tweety  is  scared  of  Sylvester.  Moreover, 
the  type  hierarchy  allows  rule-like  knowledge  of  the  form  cats  prey  on  birds'  to 
be  encoded  as  the  fact  preys-on(Cat,Bird),  and  hence  allows  this  knowledge  not 
only  to  be  used  during  reasoning  but  also  to  be  retrieved.  Without  a  type 
hierarchy,  this  knowledge  would  be  encoded  as  the  rule  Vx,  y  cat(x)  a  bird(y)  => 
preys-on(x,y).  Consequently,  it  would  participate  in  reasoning,  but  would  mn  be 
retrievable  per  A  type  hierarchy  also  allows  the  use  of  non-specific  instances  of 
types  and  can  therefore  represent  facts  like  'there  is  a  cat  that  loves  all  birds'. 

The  encoding  of  context-sensitive  rules — where  the  tiring  of  the  rule  is 
dependent  on  the  type  of  the  role  fillers — is  facilitated  by  the  use  of  types.  I'or 
example,  if  a  ball  hits  a  wall,  we  would  not  worry  about  the  ball  getting  hurt.  But 
we  would  have  no  difficulty  in  inferring  that  John  would  be  hurt  if  he  ran  into  a 
wall.  Implicitly,  we  could  be  thought  of  as  applying  the  rule  if  an  animate  agent 
collides  with  a  solid  object,  the  animate  agent  would  get  hurt'.  This  notion  can  be 
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formalized  by  the  use  ol  typed  variables,  and  the  rule  stated  as:  vx:animate. 
y;solid-ob|  collide(x,y)  hurt(x). 

In  order  that  our  reasoning  system  be  able  to  reason  etfeetively  in  the 
situations  mentioned  above,  we  need  to  augment  the  rule  base  svith  a  type 
hierarchy.  A  brief  outline  of  how  this  could  be  done  was  described  in  Shastri  and 
Ajjanagadde  >  1990).  In  this  paper,  we  describe  a  detailed  solution  to  the  problem. 

1.2.  The  .VeeJ  /or  .Wulttplc  Dynamic  Instatmaiioiis  o/  Predicates 

The  need  for  multiple  instantiation  arises  in  two  situations.  The  simpler  of  the 
two  cases — multiple  dynamic  instantiations  of  concepts  in  the  type  hierarchy — 
arises  when  two  objects  of  the  same  type  arc  being  represented.  If  the  system  has 
simultaneously  to  represent  ‘cats  are  animals’  and  ‘birds  are  animals'  in  its  state 
of  activation,  the  concept  representing  animal  will  have  to  fire  in  synchrony  with 
both  the  cat  concept  and  the  bird  concept.  The  more  general  problem  of 
representing  multiple  dynamic  instantiations  of  predicates  arises  during  rcHexive 
reasoning  as  brought  out  by  the  following  examples.  If  we  know  that  Alary  is 
John’s  spouse,  we  would  not  have  any  dilHculty  in  realiz.ing  that  John  is  Mary’s 
spouse.  In  other  words,  given  spouse-of( Mary, John)  we  can  reHexively  answer 
■yes’  to  the  query  spouse-of(John,Mary}?  Such  behavior  would  require  the 
spouse-of  predicate  to  be  instantiated  twice:  once  with  spouse-oflJohn.Mary) 
and  again  with  spouse-of( Mary, John).  ,As  another  example,  consider  the  situa¬ 
tion  in  which  we  know  that  Mary  is  older  than  John's  father.  If  we  now  hear  that 
John  married  Mary,  we  can  instantly  sense  the  unusualness  of  the  situation,  since 
Mary  is  obviously  much  older  than  John.  But  the  fact  that  Mary  is  older  than 
John  has  not  been  explicitly  stated.  This  would  suggest  that  we  may  have  inferred 
older-than( Mary. John)  using  the  facts  older-than(Mary,John's-father)  and 
older-than(John's-father,John).-  and  the  transitive  nature  of  the  older-than 
predicate.  To  model  this  scenario  in  the  reasoning  system,  we  would  need 
simultaneously  to  represent  three  instantiations  of  older-than.  Similarly,  we  can, 
without  conscious  deliberation,  infer  that  John  may  be  jealous  of  Tom  if  we  know 
that  John  loves  Mary  and  Mary  loves  Tom.  Here  again,  we  would  need  the  ability 
to  represent  multiple  instantiations  of  the  loves  predicate  to  capture  the  situation. 
Thus,  a  system  for  modeling  reflexive  reasoning  should  be  capable  i>f  representing 
multiple  instantiations  of  predicates. 

The  system  described  in  Shastri  and  Ajjanagadde  i  1990)  has  the  limitation 
ihat  any  predicate  in  the  reasoner  can  be  instantiated  at  most  once.'  In  this  paper, 
we  describe  how  this  system  can  be  extended  to  deal  with  multiple  instantiations 
of  predicates  in  the  reasoner  as  well  as  multiple  instantiations  of  concepts  in  the 
type  hierarchy. 

1.3.  Overvieze 

We  begin  with  a  brief  overview  of  the  basic  rule-based  reasoning  system  (  Section 
2).  The  realization  of  the  type  hierarchy  is  described  in  Section  .1,  followed  by  the 
specification  of  the  multiple  instantiation  mechanisms  in  Section  4.  Section  5 
describes  how  rules,  facts  and  queries  are  handled  by  the  extended  reasoning 
system.  This  section  also  describes  the  constraints  introduced  by  the  reasoning 
system  and  their  significance.  We  then  conclude  with  a  brief  discussion  of  related 
work,  the  relevance  of  this  work  and  possible  future  research  directions. 
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p-seiler  cs-obj 

Figure  1.  ;  a)  An  example  encoding  ot  rules  and  laeis.  The  network  encodes  the 
following  rules  and  facts:  Vx,y,z  give{x,y,z)  =>  own(y,z),  Vx,y  buy  (x.y) 
own(x,y),  Vx,y  own(x,y)  =>  can-sell(x,y),  give(John,Mary,Book1 ),  buy{John,x) 

and  own(l\/1ary,Ball1 ). 

Throughout  the  paper,  we  will  mostly  concern  ourselves  with  backward  reason¬ 
ing,  unless  explicitly  stated  otherwise. 

2.  The  Basic  Rule-based  Reasoning  System 

A  brief  description  of  the  basic  reasoning  system  is  provided  here.  The  reader  is 
referred  to  Shastri  and  Ajjanagadde  (1990)  for  a  detailed  exposition  of  the 
reasoning  system  and  its  characteristics.  I'igure  1(a)  illustrates  how  long-term 
knowledge  is  encoded  in  the  rule-based  reasoning  system.  The  network  shown  in 
Figure  Fa)  encodes  the  following  rules  and  facts: 

Vx,y,z  give(x,y,z)  =>  own(y.z) 

Vx,y  buy(x,y)  =>  own{x,y) 

Vx,y  own(x,y)  =>  can-sell(x,y) 

give(John,Mary,Book1 ) 

buy(John,x) 

own(Mary,Ball1 ). 
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c:give 
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Booki 
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Mary 
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input  to  e:can-sell 
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input  to  Booki 
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Figure  1.  h)  Activation  trace  tor  the  query  can-selKMary, Booki )?  ‘(^an  Mary 
sell  Mookl?’  .  The  inputs  :  to  nodes  e;can-sell,  cs-obj,  p-seller,  llookl  and  Mary) 
needed  to  pose  the  query  arc  also  shown.  In  other  activation  trace  diagrams  to 
tollow,  these  inputs  will  not  be  explicitly  shown. 


The  rule  Vx,y,z  give(x,y,z)  =>  own(y.z)  states  that  ‘it  v  gives  c  to  y,  then  y  owns 
c’.  The  other  two  rules  arc  interpreted  similarly.  The  tacts  give(John, Mary, Booki ) 
and  own  (Mary,  Bain  )  represent  ‘John  gave  Mary  Booki’  and  ‘Mary  bought  Ball!  ’, 
respectively,  while  buy(John,x)  states  that  ‘John  bought  sontcthini’' . 

The  encoding  of  rules  and  facts  makes  use  of  several  types  of  nodes  i  see 
l•’igure  2);  /i-btu  nodes  (depicted  as  circles),  r-and  nodes  depicted  as  pentagons) 
and  T-or  nodes  (depicted  as  triangles).  These  nodes  have  the  following  idealized 
behavior.  If /i-btu  node  A  is  connected  to  another  /i-btu  node  /I.  then  the  activity 
of  node  li  will  synchronize  with  the  activity  of  node  A.  In  particular,  a  periodic 
firing  of  A  will  lead  to  a  periodic  and  in-phase  tiring  of  /i.^  VC’e  assume  that  p-btu 
nodes  can  respond  in  this  manner  as  long  as  the  period  of  tiring,  n,  lies  in  the 
interval  ttm.ix  1  I  his  interval  can  be  interpreted  as  defining  the  frequency 

range  over  which  ;>-btu  nodes  can  sustain  a  synchronized  response.  T'or  a 
discussion  of  biologically  motivated  values  for  these  parameters,  see  Shastri  and 
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Figure  2,  Input-output  behavior  for  the  p-btu,  t-and  and  t-or  nodes  used  in 
the  reasoning  system.  />-btu  (  )  and  r-and  (  c2))  nodes  ean  tire  with  any  period  in 

the  interval  r-or  (  i  nodes  always  tire  with  period 


Ajjanagadde  i  1993a).  A  r-and  node  behaves  like  a  temporal  .AND  node,  and 
becomes  active  on  receiving  an  uninterrupted  pulse  train.  Dn  becoming  active,  a 
r-and  node  produces  a  pulse  tram  similar  to  the  input  pulse  tram.  A  r-or  node, 
on  the  other  hand,  becomes  active  on  receiving  any  activation;  its  output  is  a  pulse 
whose  width  and  period  equal  I-igurc  2  summarizes  the  behavior  of  these 

nodes  for  the  idealized  case  of  oscillatory  inputs. 

The  maximum  number  of  distinct  entities  that  may  participate  in  an  episode 
of  reasoning  equals  Lir  oij  where  n  is  the  period  of  oscillation.  \X'e  dehne  n)  to  be 
the  width  of  the  window  of  synchronization  —  nodes  bring  with  a  lag  or  lead  of 
less  than  (ni2  would  be  considered  to  be  in  synchrony.  The  encoding  also  makes 
use  of  inhibitory  modifiers — links  that  impinge  upon  and  inhibit  other  links.  A 
pulse  propagating  along  an  inhibitory  modiber  will  block  a  pulse  propagating 
along  the  link  it  impinges  upon.  In  Figure  li  a),  inhibitory  modibers  are  shown  as 
links  ending  in  solid  circles. 

Hach  entity  in  the  domain  is  encoded  by  a  p-btu  node.  An  ;/-ary  predicate  /’ 
is  encoded  by  a  pair  of  r-and  nodes  and  u  /»-btu  nodes,  one  for  each  ot  its  ii 
arguments.  One  of  the  r-and  nodes  is  referred  to  as  the  enabler,  c:P.  and  the  other 
as  the  collector.  c:P.  In  b'igurc  l(  a;,  enablers  point  upwards  while  collectors  point 
downwards.  The  enabler  c:P  becomes  active  whenever  the  system  is  being 
queried  about  P.  On  the  other  hand,  the  system  activates  the  collector  c:P  of  a 
predicate  P  whenever  the  system  wants  to  assert  that  the  current  dynamic- 
bindings  of  the  arguments  of  P  follow  from  the  knowledge  encoded  in  the  system. 
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A  rule  IS  eneoded  hv  ei'niieciini:  ilie  citlleeuir  i>l  ilie  antecedent  predicate  tn  the 
collector  ol  the  consequent  predicate,  the  enabler  ol  the  consequent  predicate  to 
the  enabler  of  the  antecedent  predicate,  and  bv  connecting  the  arguments  ol  the 
consequent  predicate  to  the  aruuments  ol  the  antecedent  predicate  in  ,iceordanee 
with  the  correspondence  between  these  arguments  speeitied  in  the  rule.  A  fact  is 
eneoded  using  a  r-and  node  that  receives  an  input  Irom  the  enabler  ol  the 
associated  predicate.  This  input  is  modilied  by  inhibitory  modiliers  from  the 
argument  nodes  ol  the  associated  predicate.  11  an  argument  is  hound  to  an  entity 
in  the  tact,  then  the  modifier  from  such  an  argument  node  is  in  turn  modified  by 
an  inhibitory  modifier  from  the  appropriate  entity  node.  The  output  of  the  r-and 
node  is  connected  to  the  collector  of  the  associated  predicate.  1‘igure  ha)  shows 
the  encoding  of  the  facts  give(John,Marv,Book1 ),  own(Marv,Ball1 )  and 
buy(John,x).  Note  that  the  fact  buyfJohn.x)  would  be  triggered  only  if  the 
second  argument  of  the  buy  predicate  is  unbound,  i.e.  the  object  which  John 
Kiught  IS  unspecified. 

2./.  /7/e  liilcrcnce  Process 

I’osing  a  query  to  the  system  involves  specifying  the  query  predicate  and  the 
argument  bindings  specified  in  the  query.  In  the  proposed  system,  this  is  done  by 
simply  activating  the  relevant  nodes  in  the  manner  described  below.  Let  us 
choose  an  arbitrary  point  in  time — say,  — as  our  point  of  reference  for  initiating 
the  query.  We  assume  that  the  system  is  in  a  quiescent  state  just  prior  to  The 
query  predicate  is  specified  by  activating  the  enabler  of  the  query  predicate  with 
a  pulse  train  of  width  and  periodicity  n  starting  at  time  f,,. 

The  argument  bindings  specified  in  the  query  are  communicated  to  the 
network  as  follows.  Let  the  argument  bindings  in  the  querv  involve  u  distinct 

entities:  c, . c„.  With  each  iv,  associate  a  delay  i),  such  that  no  two  delays  are 

within  o)  of  one  another  and  the  longest  delay  is  less  than  r  -  o.  As  mentioned 
earlier.  <■/  is  the  width  of  the  window  ol  synchrony  and  n  is  the  period  of 
oscillation.  Lach  of  these  delays  may  be  viewed  as  a  distinct  phase  within  the 
period  /„  and  t„  -I-  n.  Now  the  argument  bindings  of  an  entity  c,  are  indicated  to 
the  system  by  providing  an  oscillatory  spike  train  of  periodicity  r  starting  at 
/„  +  to  c,  and  all  arguments  to  which  c,  is  bound.  This  is  done  for  each  entity 
c,  ( I  L.  /  ^  //t  and  amounts  to  representing  argument  bindings  by  the  in-phase  or 
synchronous  activation  ol  the  appropriate  entity  and  argument  nodes. 

We  illustrate  the  reasoning  process  with  the  help  of  an  example.  C^insider  the 
query  can-sell(Mary,Bookl )?  t  i.e.  ‘C^an  Mary  sell  Bookl?''  This  query  is  posed 
by  providing  inputs  to  the  entities  Mary  and  Bookl,  the  arguments  p-seller  and 
cs-obj,  and  the  enabler  e;can-sell,  as  shown  in  l-'igure  I.bt.  Mary  and  p-seller 
receive  in-phase  activation  and  so  do  Bookl  and  cs-obj.  Let  us  refer  to  the  phase 
of  activation  of  Mary  and  Bookl  as  and  //,,  respectively.  As  a  result  tif  these 
inputs,  Mary  and  p-seller  will  fire  synchronously  in  phase  />,  of  every  period  of 
oscillation,  while  Bookl  and  cs-obj  will  lire  synchronously  in  phase  />.  of  every 
period  of  oscillation.  The  node  e:can-sell  will  also  oscillate  and  generate  a  pulse- 
train  of  periodicity  and  pulse  width  r.  The  activations  from  the  arguments 
p-seller  and  cs-obj  reach  the  arguments  owner  and  o-ob|  of  the  own  predicate, 
and,  consequently,  starting  with  the  second  period  of  oscillation,  owner  and  o-ob| 
become  active  in  //,  and  //,,  respectively.  At  the  same  time,  the  activation  from 
e  ran-sell  activates  e:own.  Phe  system  has,  essentially,  created  dynamic  bindings 
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tor  the  arKumcnts  of  predicate  own.  Mary  has  been  hound  lo  ihe  arjiument 
owner,  and  Booki  has  been  bound  to  the  argument  o-ob|.  l  liese  newly  created 
bindings  in  conjunction  with  the  activatu>n  of  eown  can  he  thought  of  as 
encoding  the  query  own  (Mary,  Booki )?  i.e.  'Does  Marv  own  Mookl?'  I'lie 
r-and  node  associated  with  the  tact  own( Mary.Balll )  does  not  match  the  query 
and  remains  inactive.  The  activations  Imm  owner  and  o-obj  reach  the  arguments 
recip  and  g-obj  ol  give,  and  buyer  and  b-obj  ot  buy.  respectively.  Thus, 
beginning  with  the  third  period  of  oscillation,  arguments  recip  and  buyer  become 
active  in  p,,  while  arguments  g-ob|  and  b  ob)  become  active  in  p,.  In  essence,  the 
system  has  created  new  bindings  lor  the  predicates  give  and  buy  that  can  be¬ 
thought  of  as  encoding  two  new  queries;  give(x,Mary, Booki )?  i.e.  Did  sotneone 
give  Mary  llookl?'  and  buy( Mary. Booki )?  Observe  that  now  the  r-and  node- 
associated  with  the  fact  give(John,Mary,Book1 )  —  this  is  the  r-and  node  labeled 
/•'I  in  l-'igure  ha' — becomes  active  as  a  result  of  uninterrupted  activation  trom 
e  give.  The  inhibitory  inputs  from  recip  and  g-obj  are  blocked  by  the  in-phase- 
inputs  from  Mary  and  Booki.  respectively.  The  activation  from  this  r-and  node- 
causes  c:give.  the  collector  of  give,  to  become  active.  I'he  output  Irc/m  c:give  in 
turn  causes  c:own  to  become  active  and  transmit  an  output  to  c;can-sell. 
Consequently,  c:can-sell,  the  collector  ol  the  query  predicate  can-sell,  becomes 
active  (refer  to  l-'igure  I  bi)  resulting  in  an  aftirmative  answer  to  the  query 
can-sell(Mary,Book1 )? 

3.  The  Type  Hierarchy 

Overviext.' 

l-'igure  ,3(a)  gives  an  overview  ol  the  reasoning  system  augmented  with  a  type 
hierarchy.  The  rule-based  part  ol  the  network  encodes  ilie  rule  Vx.y  preys- 
on(x,y)  =>  scared-of(y,x)  :  i.e.  it  v  preys  on  v,  then  \  is  scared  ot  v'  ,  and  the 
lacts  Vx.Cat,  y.Bird  preys-on(x,y)  and  lx;Cat  VyiBird  loves(x,y)  I  he  lormer  tact 
is  equivalent  to  preys-on{Cat.Bird)  and  the  amounts  to  cats  prev  on  birds'.  The 
lattcr  amounts  to  there  is  a  cat  that  loves  all  birds’.  I'hc  network  on  the  right  in 
l-'igure  .3(a)  encodes  the  following  is-a  relationships;  is-a( Bird, Animal),  is- 
a(Cat,Animal),  is-a(Robin,Bird),  is-a(Canary,Bird),  is-a(Chirpy,Robin),  is- 
a(Tweety, Canary)  and  is-a(Sylvesier,Cat). 

3.1.1.  Intcrprcivi^  facts,  l-'acts  involving  typed  variables  are  interpreted  in  the 
following  manner; 

•  A  typed,  universally  quantified  variable  is  treated  as  being  equivalent  to  us 
type.  Also,  any  entity  directly  specified  in  a  fact  is  treated  as  a  substitute  for  a 
typed  universal  variable.'  Thus.  Vx.Cat,  y:Bird  preys-on(x,y),  Vx.Cat  preys- 
on(x,Bird)  and  Vy;Bird  preys-on(Cat,y)  are  all  encoded  as  preys-on(Cat,Bird). 

•  A  typed,  existentially  quantified  variable  is  encoded  using  a  unique  sub¬ 
concept  of  the  associated  type.  Thus,  in  l-'igure  3(ai,  3x;Cat  Vy;Bird  loves(x,y) 
is  encoded  as  loves(Cat-1  .Bird),  where  Cat-1  is  assumed  to  he  a  unique- 
instance  of  Cat.'’ 

rhe  example  in  Section  .3.4  clarifies  these  notions.  Note  that  this  scheme  deals 
only  with  existential  variables  outside  the  scope  of  universally  quantified  vari¬ 
ables. 
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Figure  3.  (a)  Interaction  between  the  reasoner  and  the  type  hierarchy.  The  rule 
base  encodes  the  rule  Vx.y  prevs-on(x,y)  =>  scared-of(y.x)  and  the  Tacts 
Vx:Cat,  y:Bird  preys-on(x,y)  and  3x:Cat  Vy:Bird  loves(x,y).  The  type  hierarchy 
encodes  the  tollowinp  is-a  relations:  is-a(Bird,Aninnal),  is-a(Cat,Animal),  is- 
a(Robin,Bird),  is-a(Canary,Bird),  is-a(Chirpy, Robin),  is-a(Tweety,Canary)  and 

is-a(Sylvester,Cat). 


I’or  now,  let  us  assume  the  following:  each  type  or  instance  is  encoded  as  a 
p-btu  node;  each  conceptual  is-a  relationship  such  as  is-a(A,B)  is  encoded  using 
two  connectionist  links — a  bottom-up  link  from  A  to  li  and  a  top-down  link  from 
H  to  Ai  and  that  the  top-down  and  bottom-up  links  can  be  enabled  selectively  by 
built-in  control  mechanisms. 

The  time  course  of  activation  for  the  query  scared-of(Tweety,Sylvester)?  :  ‘Is 
Tweety  scared  of  Sylvester?"!  is  given  in  h'igure  ,3  b).  i'he  query  is  posed  by 
turning  on  e;scared-of  and  activating  the  nodes  Tweety  and  Sylvester  in  syn¬ 
chrony  with  the  first  scaree)  and  second  (scarer)  arguments  of  scared-of. 
respectively.  The  bottom-up  links  emanating  from  Tweety  and  Sylvester  are  also 
enabled.  In  the  rule  base,  activation  from  scaree  spreads  to  prey,  the  second 
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Figure  3.  (b)  Trace  of  spreading  activation  for  the  query  scared- 
of(Tweety, Sylvester)?  els  'Tweety  scared  of  Sylvester'''. 


argument  of  the  preys-on  predicate.  Similarly,  the  activation  of  scarer  spreads  to 
predator.  At  the  same  time.  e;scared-of  activates  e;preys-on.  As  a  result,  the 
initial  query  scared-of(Tweety,Sylvester)?  is  rephrased  as  preys-on(Syl- 
vester, Tweety)?  'Does  Sylvester  prey  on  Tweety'"  i.  C.oncurrently  with  activa¬ 
tion  spread  in  the  rule  base,  activation  also  propagates  m  the  type  hierarchy.  This 
causes  Canary  and  Bird  to  tire  in  synchrony  with  Tweety,  and  Cat  in  synchrony 
with  Sylvester.  The  net  result  of  activation  propagation  in  the  rule  base  and  type 
hierarchy  is  to  transform  the  query  scared-of(Tweetv, Sylvester)?  into  the  query 
preys-on(Cat,Bird)?  (refer  to  Figure  3(b)).  The  latter  query  matches  the  stored 
fact  preys- on  (Cat,  Bird)  and  leads  to  the  activation  of  c:preys-on.  In  turn, 
c:scared-of  becomes  active  and  signals  an  affirmative  answer  to  the  query. 


3.J.  Tzuo  I'cchuical  Problems  in  Realizing  a  Type  Hierarchy 

There  are  two  technical  problems  that  must  be  solved  in  order  to  integrate  the 
type  hierarchy  and  the  rule-based  component. 

First,  the  encoding  of  the  is-a  hierarchy  should  be  capable  ot  representing 
multiple  instantiations  of  a  concept.  F'or  example,  in  the  query  discussed  above, 
the  concept  Animal  would  receive  activation  originating  from  Tweety  as  well  as 
Sylvester.  We  would  like  the  network's  state  of  activation  to  represent  both  ‘the 
animal  Tweety’  and  the  ‘the  animal  Sylvester’  'This  is  problematic  because  the 
p-btu  node  representing  Animal  cannot  be  in  synchrony  with  both  Tweety  and 
Sylvester  at  the  same  time. 

Second,  the  encoding  must  provide  built-in  mechanisms  for  controlling  the 
propagation  of  activation  in  the  is-a  hierarchy  so  as  to  deal  correctly  with  queries 
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containing:  existentially  and  universallv  Liuaniitied  variables,  riuis, 

•  Activation  originating  trom  an  instance  or  a  concept  (,  that  correspi>nds  to  a 
universally  quantified  variable  in  the  query  should  propagate  upwards  to  all  its 
ancestors.  Activation  propagating  upwards  is  equivalent  to  cheeking  it  the 
relevant  fact  is  universally  true  lor  some  ancestor  of  in  which  case  it  is  true 
for  all  6'. 

•  If  the  is-a  hierarchv  is  a  taxonomy,  then  activation  originating  from  a  concept 
CJ  that  corresponds  to  an  existentially  quaniilied  variable  in  the  query  should 
propagate  to  the  ancestors  as  well  as  descendants  of  A  fact  is  true  for  some 
object  of  type  ('  if  at  least  I'ne  of  the  following  holds: 

—  The  fact  is  universally  true  for  an  ancestor  of  Activation  traveling 
upwards  from  <•'  checks  this  case. 

—  The  fact  is  true  for  some  descendant  of  Activation  traveling  downwards 
from  is  meant  to  check  this  condition. 

Ihiwever,  it  the  is-a  hierarchy  permits  multiple  inheritance,  then  the  fact 
would  be  true  of  ('  if  it  is  universally  true  for  an  ancestor  of  a  descendant  of  C 
This  requires  that  the  activation  must  also  propagate  to  the  ancestors  of  the 
descendants  of  The  multiple  inheritance  situation  is  illustrated  by  the 

scenario  in  Figure  4:  if  ‘all  pets  are  lovable',  then  it  also  follows  that  ‘there 
exists  some  animal  that  is  lovable’,  (liven  the  fact  Vx:Pet  lovable(x)  i ‘all  pets 
are  lovable'),  to  be  able  to  give  an  affirmative  answer  to  the  query  3x:Animal 
lovable(x)?  (‘Is  there  some  animal  that  is  lovable?'\  we  need  to  be  able  to 
propagate  activation  t<i  all  the  ancestors  of  the  descendants  of  Animal.  Thus,  we 
require  activation  originating  from  a  concept  which  corresponds  to  an 
existentially  quantified  variable,  to  propagate  to  its  ancestors,  descendants  and 
ancestors  vif  descendants.  Again,  the  example  in  Section  4.4  clarifies  these 
notions. 

The  next  section  proposes  a  solution  to  these  problems. 

.f..4.  Implcmcntitif;  the  I'ype  Hierarchy 

3.3.1.  Represemmf’  entities.  1-ach  entity  ti  e.  type  vir  instance)  (.'  is  represented  by 

a  group  of  nvides  called  the  entity  cluster  for  Such  a  cluster  is  organized  as 
shown  in  F'igure  5(a).  I'he  entity  cluster  for  has  A,  banks  of /i-btu  nodes,  where 
/ti,  the  type  hierarchy  multiple  instantiation  constant,  refers  to  the  number  of 
dynamic  instantiations  a  concept  can  accommodate.  Fach  bank  consists  of 
three  p-btu  nodes:  .  l{ach  represents  a  distinct  dynamic  instantia¬ 

tion  of  f-’.  If  this  instantiation  is  in  phase  p.  then  fires  in  phase  /».  The  relay 
nodes  and  control  the  direction  of  propagation  of  the  activation  repre¬ 
sented  by  ((, .  The  ((,,  and  nodes  have  a  threshold  0  =  2.  As  shvuvn  in  I'igurc 
5(a),  T,  is  connected  to  both  (.'.•  and  is  linked  to  but  not  vice  versa. 

Directional  control  of  propagating  activation  is  exercised  using  a  suitable  modifi¬ 
cation  of  the  relay-node  scheme  discussed  in  Shastri  i  1988). 

3.3.2.  The  type  hierarchy  sveitch  T-stettch ).  Fvery  entity  (  is  associated  with  two 
type  hierarchy  switches — a  top-down  I'-switch  and  a  bottom-up  I'-switch.  In 
order  to  avoid  confusion  with  switches  introduced  to  handle  multiple  dynamic 
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Figure  4.  (a)  Fragment  of  a  type  hierarchy  with  multiple  inheritance,  encoding 
the  fact  Vx:Pet  lovable(x).  The  type  hierarchy  encodes  the  following  is-a  rela¬ 
tions;  is-a(Bird, Animal),  is-a(Bird.Pet)  and  ls-a(Canarv,Bird).  h  .Activation 
trace  for  the  query  3x;Anima(  lovable(x)?  ‘Is  there  an  animal  that  is  lovable-''  . 


instantiations  of  predicates,  we  shall  refer  to  the  switches  used  in  the  type 
hierarchy  as  T-switches.  The  F-switches,  both  of  which  are  identical  in  structure, 
control  the  How  of  activation  in  the  type  hierarchy,  liach  T-switch  has  outputs. 
Output,  from  the  bottom-up  T-switch  connects  to  and  f.',,  while  the  corre¬ 
sponding  output  from  the  top-down  T-switch  goes  to  the  and  (,',j  nodes. 
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inputs  from 


Figure  5.  (a)  Structure  of  the  entity  cluster  for  and  its  interaction  with  the 
bottom-up  and  top-down  type  hierarchy  switches.  The  T  and  i  nodes  have  a 
threshold  0  =  2.  The  multiple  instantiation  constant.  /^,  =  and  represents  the 
number  of  instantiations  that  can  be  represented  in  any  entity  cluster,  b  1-ncoding 
of  the  is-a  relation  is-a{A,B).  A  bundle  of  wires  is  represented  by  a  single  link. 
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1  i.  /  ^ /fe, .  The  bi>ttom-up  I'-swiich  has  A-,  mputs  while  the  lop-dmvn 

I'-switch  has  A,  inputs,  and  heinj;  ihe  number  ol  sub-  and 

super-concepts  ot  respectively.  I'urther,  there  is  also  a  leedback  trom  the  (', 
nodes  to  btuh  the  1  -switches  see  I'leure  5  a^  and  l-'i^ure  6  . 

The  interaction  between  the  I’-switches  and  the  entity  cluster  bigure  5'  ai) 
brings  about  efticient  and  automatic  dvnamic  allocation  of  banks  in  an  entity 
cluster,  by  ensuring  that: 

•  Activation  is  channeled  to  the  entitv  cluster  banks  only  it  the  entity  cluster  can 
accommodate  more  instantiations;  the  maximum  number  of  instantiations  is. 
therefore,  limited  to  A, . 

•  Hach  picks  up  a  unique  phase;  thus,  new  instantiations  are  always  in  a  phase 
not  already  represented  in  the  entity  cluster. 

The  architecture  of  the  I'-switch  with  A,  =  V'  is  illustrated  in  I-'igure  6.  The  A, 
/i-hru  nodes,  .S', ,  ,  S',.  ,  with  their  associated  r-or  nodes  form  the  basic 

components  of  the  I'-switch.  l-.verv  input  t('  the  I'-NWitch  makes  two  connec¬ 
tions — one  excitatory  and  .me  inhibitory  —  to  each  of  .S’,,  .  .  .  ,  .S',.  ;  these  inputs 
directly  connect  to  .S', .  As  a  result  (»t  these  excitatory-inhibitory  connections,  the 
S',,  ,  .  .  ,  .S',. ^  nodes  cannot  respond  to  mcom.mg  activation.  Input  activation  will 


output^  oufpu^  output^ 


Figure  6.  Architecture  of  the  type  hierarchy  T-switch,  which  arbitrates  the 
How  ot  activation  in  the  type  hierarchy.  The  multiple  instantiation  constant  A,  =  .i. 
and  represents  the  number  of  instantiations  that  can  be  represented  in  any 

entity  cluster. 
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iherctorc  have  an  effect  onlv  mi  the  -V,  node  ;  l-’inure  ft).  In  keeping;  with  the 
behavior  of  /)-btu  nodes  Section  d .S',  selects  an  arbitrary  active  input  and 
continues  to  Hre  in  phase  with  that  input  as  lonj;  as  it  remains  active.  As  .S',  fjoes 
active,  the  r-or  node  associated  with  .S',  turns  ON,  thereby  enabling  .S.  via  the 
•A  link).  Inhibitory  feedback  from  via  the  If  link)  ensures  that  .S.  is 
not  enabled  during  the  phase  (>  in  which  t.',  is  firing.  Thus,  .S,  selects  and  starts 
tiring  in  a  phase  other  than  />.  Once  S',  has  made  its  selection,  .S,  gets  its  turn,  and 
so  on. 

Note  that,  in  general,  could  receive  input  in  two  phases — one  from  the 
bottom-up  I'-switch  for  (.'  and  another  from  its  top-down  I'-switch.  being  a 
/)-btu  node,  picks  one  of  these  phases  to  lire  in.  As  instantiations  are  deputed  to 
the  entity  cluster,  the  ^>-btu  nodes  in  the  T-switch  are  progressively  enabled  from 

left  to  right.  If  O, . ,  are  firing  in  phases  />, . />,  ,,  then  .S',  always 

picks  a  distinct  phase  />  ^  !/», . /»,  , since  inputs  in  phases  . . .  ,  are 

inhibited  by  the  feedback  links  from  . , .  At  any  stage,  if  1  -  /  *-  fe, 

picks  up  activation  channeled  bv  the  other  I'-switch,  feedback  from  into  the 
r-or  node  associated  with  .S',  causes  .S',  .  ,  to  be  enabled,  even  though  .S’,  has  not 
picked  a  phase.  This  mechanism  ensures  that  at  most  A,  instantiations  are  selected 
jointly  by  the  bottom-up  and  top-down  I'-switches;  hence,  only  /r,  instantiations 
can  be  channeled  to  at  worst. 


J.J.J.  ('onnectin.i>  up  the  type  hierarchy.  A  fact  of  the  form  is-a(A,B)  is  repre¬ 
sented  as  shown  in  Figure  5i  b)  by  connecting  the  .-I,,,  i  =  1, .  .  .  ,  nodes  to  the 

bottom-up  I'-switch  for  li;  and  connecting  the  /?,,,  t  =  i . nodes  to  the 

top-down  'I'-switch  for  A. 

Con.sider  a  concept  ('  in  the  type  hierarchy.  .Suppose  (',  receives  activation 
from  the  bottom-up  I'-switch  in  phase  p.  starts  firing  in  synchrony  with  this 
activation.  The  node  is  now  receiving  two  inputs  in  phase  p  i  one  from  the 
bottom-up  I'-switch  and  the  other  from  (  ;  see  Figure  .^la)'.  Since  it  has  a 
threshold  H  =  2,  also  fires  in  phase  />.  This  causes  activation  in  phase  (i  to 
spread  eventually  to  the  super-concept  of  (.'.  Hence,  any  upward  traveling 
activation  continues  to  travel  upwards — which  is  the  required  behavior  when 
IS  associated  with  a  universal  typed  variable  i  .Section  .V2).  Similarly,  when  (  ', 
receives  activation  from  the  top-down  I'-switch  in  phase  ^»,  both  and  (’,j 
become  active  in  pha.se  p.  follows  suit,  becau.se  of  the  link  from  (,',j  to  (,’,,,  so 
that  the  whole  bank  now  fires  in  phase  p.  Thus,  while  any  activation  traveling 
downwards  continues  to  travel  downwards,  it  also  sets  off  upward  activation  trails 
from  every  concept  encountered  on  its  way.  This  mechanism  allows  a  concept 
associated  with  an  existential  typed  variable  to  spread  its  activation  eventually  to 
its  ancestors,  descendants  and  ancestors  of  descendants,  which  is  in  keeping  with 
the  desired  behavior  mentioned  in  Section  .^.2.  Note  that  the  behavior  of 
downward  activation  is  unlike  that  of  upward  activation — upward  activation  just 
continues  upwards  while  downward  activation,  apart  from  continuing  down¬ 
wards,  also  sets  off  an  upward  trail. 

3.4.  Example 

.Assuming  that  each  concept  in  the  type  hierarchy  shown  in  I'igure  .A  a  has  the 
structure  indicated  in  b’igure  5(a),  the  query  3x:Cat  loves(x,Tweetv)?  (  Is  there  a 
cat  that  loves  Tweety?’)  would  be  posed  by:  ( i)  activating  Cat,  and  Cat,,  to  fire 
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in  synchrony  with  lover;  ui)  activating  TweetVi  i*nd  Tweety,,  to  tire  in  synchrony 
with  lovee;  and  liii)  activating  eiloves.  the  enabler  ot  the  loves  predicate. 

Since  Cat,  is  associated  with  an  existential  variable.  Cat,  and  Cat,  are 
activated  (see  Section  5.2')  and  this  activation  spreads  to  Animal,  >  upwards,  and 
to  Cat-1,  (downwards).  Activation  trom  Cat,  also  spreads  to  the  ancestors  of  its 
descendants.  Since  the  ancestors  of  the  descendants  of  Cat  arc  Cat  and  Animal, 
and  since  these  concepts  already  have  banks  tiring  in  phase  with  Cat, ,  no  new 
instantiations  are  introduced.  Tweety  is  an  entity  directly  appearing  in  the  query 
and  is  equivalent  to  a  universally  typed  variable.  Activation  from  Tweety, 
therefore  only  propagates  upwards,  to  Canary,.  Bird,  and  Animal2.  Figure  7 
shows  the  resulting  spread  of  activation  in  the  network.  The  activity  of  the 
corresponding  t  and  i  nodes  are  also  indicated. 

Activation  spreading  downwards  from  Cat,  causes  Cat-1,  to  go  active,  while 
upward  activation  Ifom  Tweety,  eventually  reaches  Bird, .  When  this  happens. 


Figure  7,  Trace  of  spreading  activation  for  the  query  3x;Cat  loves(x, Tweety)?. 
The  rule  base  and  the  type  hierarchy  arc  as  shown  in  I'Tgure  3(a),  except  that  the 
entities  in  the  type  hierarchy  are  assumed  to  have  the  structure  indicated  in 
Figure  5.  Activation  of  only  those  nodes  relevant  to  the  query  arc  shown.  The 
time  taken  for  activation  to  traverse  the  '('-switches  have  been  ignored  in  order  to 

simplify  the  diagram. 
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the  lact  node  h'2  eorrespondinj:  to  the  tact  dx:Cat,  Vy:Bird  loves(x,y)  turns  ON. 
This  activates  the  enabler  eiloves  resulting  in  an  athrmative  answer  to  the  query. 


4.  Multiple  Dynamic  Instantiation  of  Predicates 

4.1.  Overvietv 

As  mentioned  in  Section  1.2,  being  able  to  represent  multiple  dynamic  tacts  about 
the  same  predicate  provides  several  addititinal  capabilities  not  possible  in  the 
original  reasoner.  Introduction  ot  multiple  instantiation  relies  on  the  assumption 
that,  during  an  episode  ot  reriexive  reasoning,  any  given  predicate  need  only  be 
instantiated  a  bounded  number  ot  times.  In  Shastri  and  Ajjanagaddc  ( 1993a),  it 
is  argued  that  a  reasonable  value  tor  this  bound  is  around  three.  We  shall  refer  to 
this  bound  as  the  multiple  dynamic  instantiation  constant  tor  predicates,  k,. 


4.2.  I mplcuu’tmuf’  Multiple  Dyuamte  hmtuiitiaiioti 

4.2.1.  Representint;  predicates.  Since  every  predicate  must  now  be  capable  ot 
representing  up  to  dynamic  instantiations,  predicates  are  represented  using  k, 
banks  of  units,  liach  bank  ot  an  //-ary  predicate  /’  consists  ot  r-and  nodes  tor  the 
collector  u-:R)  and  enabler  u:/*'  along  with  u  /i-btu  nodes  U’,rK,  ’  •  •  •  > 
representing  the  arguments  ot  I*.  I•'ach  bank  is  essentially  similar  to  the  predicate 
representation  used  in  Shastri  and  Ajjanagaddc  s  1990).  I'igure  8  illustrates  the 
structure  of  predicates  in  the  system.  Note  that  the  enabler,  e:P.  and  the 
arguments,  .  .  . ,  have  a  threshold'  0  =  2. 

For  a  given  predicate  /^  the  enabler  ot  the  fth  bank  e:P,  will  be  active 
whenever  the  /th  bank  has  been  instantiated  with  some  dynamic  binding.  The 
collector  e:P,  of  the  ith  bank  will  be  activated  whenever  the  dynamic  bindings  in 
the  /th  bank  follow  from  the  knowledge  encoded  in  the  system. 


4.2.2.  The  multiple  instantiatiou  switch  .M-sjvitch).  Fivery  predicate  in  the  ex¬ 
tended  system  has  an  associated  multiple  instantiation  switch,  referred  to  as  the 
M-switch.'*  All  connections  to  a  predicate  are  made  through  its  M-switch.  The 
M-switch  has  k,  output  cables  i  see  Figure  8),  each  of  which  connects  to  one  bank 
of  the  predicate.  A  cable  is  a  group  of  wires  originating  or  terminating  at  a 
predicate  bank;  a  cable,  therefore,  has  wires  from  all  the  units  t  collector,  enabler 
and  argument  units)  in  a  bank,  l-ach  output  cable  from  the  M-switch  is  accompa¬ 
nied  by  a  latch  enable  link.  Activation  of  the  latch  enable  link  associated  with  the 
/th  output  cable  indicates  that  the  M-switch  has  successfully  selected  an  instanti¬ 
ation  for  the  /th  bank  of  the  predicate. 

The  M-sw'itch  arbitrates  input  instantiations  to  its  associated  predicate  and 
brings  about  efficient  and  automatic  dynamic  allocation  of  predicate  banks  by 
ensuring  the  following: 

•  F'resh  predicate  instantiations  are  channeled  to  the  predicate  banks  only  if  the 
predicate  can  accommodate  more  instantiations. 

•  All  inputs  that  transform  to  the  same  instantiation  are  mapped  into  the  same 
predicate  bank.  Thus,  new  instantiations  selected  for  representation  in  the 
predicate  are  always  unique. 
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Figure  8.  An  overview  of  the  multiple  instantiation  system.  P  and  (>  are  binary 
predicates  while  is  a  ternary  predicate.  Overall  connection  pattern  depicting  the 
encoding  of  two  rules — one  relating  P  and  Q  and  the  other  relating  P  and  R — is 
shown.  Nodes  marked  with  a  ‘2’  have  a  threshold  0  =  2.  The  multiple  instantia¬ 
tion  constant  k.  =  3,  and  represents  the  number  of  instantiations  that  can  be 
simultaneously  represented  in  a  predicate. 


4.2.3.  Structure  and  operation  of  the  multiple  instantiation  .szvitch.  b'igure  9  illus¬ 
trates  the  construction  of  the  M-switch.  The  Al-switch  consists  of  k,  groups  or 
ensembles  of  units.  The  figures  use  k,  =  3.  I'he  output  of  the  tth  en.scmble  is  a 
cable  (along  with  its  latch  enable  link)  which  connects  to  the  Jth  bank  of  the 
corresponding  predicate. 

As  can  be  seen  in  Figure  9(a),  each  ensemble  consists  of  an  arbitrator  bank 
and  several  input  banks.  The  arbitrator  consists  of  n  /^-btu  nodes  representing  the 
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(a) 


cable  to  cable  to  cable  to 

predicate  predicate  predicate 
bank  1  bank  2  bank  3 


Figure  9.  (.a)  Structure  of  the  multiple  instantiation  M-switch  which  arbitrates 
the  flow  of  dynamic  instantiations  into  predicates.  We  have  again  assumed  that 
the  multiple  instantiation  constant  this  represents  the  number  of  instanti¬ 

ations  that  can  be  simultaneously  represented  in  a  predicate.  Detailed  connections 
are  not  shown  to  avoid  cluttering. 


arguments  of  the  associated  w-ary  predicate,  n  —  1  r-or  nodes  and  two  r-and 
nodes  for  the  collector  and  enabler,  liach  /»-btu  node,  except  for  the  node 
representing  the  first  argument,'  is  associated  with  a  r-or  node,  as  shown  in 
Figure  9(b).  The  ith  arbitrator  bank  directly  connects  with  the  ith  bank  of  the 
predicate.  I'igure  9(b)  shows  the  detailed  structure  of  the  arbitrator  and  input 
banks.  Kach  input  bank  consists  of  n  /»-btu  units  representing  the  arguments  of 
the  predicate,  and  two  r-and  nodes  representing  the  collector  and  enabler  of  the 
bank.  I-)ach  input  bank  also  has  a  r-or  node  associated  with  it.  The  cable 
terminating  at  the  input  bank  is  an  input  to  the  M-switch;  the  outputs  of  the 
input  bank  connect  to  the  arbitrator  of  the  respective  ensemble.  Corresponding 
input  banks  across  ensembles  are  interconnected  as  shown  in  l-igurc  9. 

Ignoring  the  associated  r-or  nodes,  the  input  banks  and  the  arbitrators  have  a 
structure  which  exactly  mimics  the  bank  structure  of  the  predicate  with  which  the 
M-switch  is  associated.  If  the  predicate  has  n  arguments,  the  input  hanks  and 
arbitrator  banks  also  have  n  />-btu  units.  The  number  of  lines  in  the  input  cable 
is  decided  by  the  arity  of  the  predicate  originating  the  cable.  The  number  of  lines 
in  the  M-switch  output  depends  on  the  arity  of  the  predicate  associated  with  the 
M-switch.  Since  each  input  cable  is  connected  to  an  input  bank  in  each  of  the  k, 
ensembles,  each  ensemble  in  the  M-switch  has  the  same  number  of  input  banks. 
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tin  numoer  ot  periods) 


Ensemble  i 

Figure  9.  b)  Structure  ut  the  tth  ensemble  in  the  W-switeh.  Only  connections 
from  input  bank  Q'  ft)  the  arbitrator  arc  shown.  (!)onnection.s  to/fn)m  other  input 
banks  and  the  arbitrator  are  implied.  As  indicated,  connections  to  c:Arh  in  the 

first  ensemble  arc  different. 


I’o  start  with,  incoming  instantiativms  will  activate  the  corresponding  input 
banks  in  ail  the  ensembles  of  the  switch.  All  ensembles  in  the  switch  except  the 
first  are  disabled  and  cannot  respond  to  incoming  activation  for  the  (dllowing 
reason;  nodes  in  the  arbitrators  of  all  en-embles.  except  c:/irb  in  the  lirst 
arbitrator,  receive  both  an  excitatory  and  an  inhibitory  input  from  their  respective 
input  units.  The  activation  therefore  cancels  out  and  the  arbitrator  nodes  in  these 
other  ensembles  do  not  become  active.  Any  activation  incident  on  the  switch  will 
therefore  affect  only  the  first  ensemble.  Activation  in  one  or  more  input  banks  of 
the  first  ensemble  will  cause  the  enabler  in  the  arbitrator,  c.vfrf),  to  become  active. 
All  input  banks  with  inactive  enablers — i.e.  input  banks  with  no  incoming 
activation — will  be  inhibited  via  the  r-or  nodes  associated  with  the  respective 
input  banks.  The  activation  of  t’.vfrfi  in  the  first  ensemble  will  block  the  inhibitory 
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inputs  to  the  p-htu  node,  .  ,  iherebv  enabling  this  nude  to  pick  a  phase  to 

lire  in.  This  node,  in  keeping  with  the  behavior  ol  /i-btu  nodes,  arbitrarily  selects 
one  of  its  input  phases  and  begins  lirin);  in  that  phase."’  .\s  soon  as  .  selects 

a  phase  to  tire  in,  this  phase  is  communicated  to  all  the  input  banks,  via  the  r-or 
node  associated  with  the  input  banks  see  l-igure  9'  b)'.  bor  each  ol  the  input 
banks,  the  associated  r-or  node  checks  it  the  phase  selected  by  .  is  the  same 

as  the  phase  in  which  the  first  arjrument  of  that  input  bank  is  tiring  in.  If  the 
phases  do  not  match,  the  corresponding  r-or  node  shuts  otf  the  entire  input  bank. 
Thus,  when  selects  a  phase  />  from  its  input,  all  activation  except  that  in 

which  the  first  argument  tires  in  phase  />  is  inhibited. 

In  the  meantime,  v:Arb  would  liave  activated  the  r-i>r  node  associated  with  the 
second  argument,  Arh,^^_^  in  the  arbitrator.  This  enables  to  select  a  phase 

from  the  activation  remaining  after  inhibiting  instantiations  that  do  not  agree  with 

■  Mote  that  is  enabled  by  the  associated  r-or  node  independent  ot 

■  and  will  select  a  phase  lo  fire  in  even  if  .  is  inactive  which  would  he 

the  case  if  all  incoming  instantiatums  have  an  unbound  first  argument).  I'he 

process  continues,  allowing  select  phases  lo  lire  in.  Alter. 

.•lr/>„^  has  made  its  choice,  the  first  ensemble  wi>uld  have  picked  an  instantiation 
to  be  channeled  to  the  predicate  bank.  The  latch  enable,  which  originates  at  the 
r-or  node  associated  with  .  ,  becomes  active  and  the  selected  instantiation  is 

transferred  to  the  first  predicate  bank.  .A  link  from  this  last  r-or  node  to  i.’.lrb  in 
the  second  ensemble  enables  the  second  ensemble  to  select  a  fresh  instantiation. 

After  the  first  ensemble  has  selected  an  instantiation  to  be  channeled  to  the 
predicate,  only  those  input  hanks  which  represent  this  exact  pattern  of  activation 
will  be  active  in  the  first  ensemble.  AH  other  input  banks  will  be  inhibited  due  to 
a  mismatch  in  the  tiring  pattern,  l-urther,  input  banks  remaining  active  in  the  first 
ensemble  will  blot  out  activation  in  all  corresponding  input  banks  in  all  the  other 
ensembles.  This  ensures  that  the  instantiation  selected  by  ihe  first  ensemble  will 
not  be  selected  again  in  any  other  ensemble. 

Once  the  second  ensemble  is  enabled  by  blocking  inhibitorv  inputs  to  c..  li7' 
in  the  ensemble),  it  will  pick  an  instantiation,  channel  it  to  the  predicate  hank, 
and  enable  the  third  ensemble  in  the  M-switch,  and  so  on.  I'he  process  continues 
until  /i,  instantiatiiins  have  been  channeled  to  the  predicate,  after  which  any  tresh 
input  instantiations  are  ignored. 

Note  that  the  ensembles  in  the  M-switch  have  an  implicit  ordering  from  left 
to  right  ( I'igure  9(a)).  If  the  itb  ensemble  i  1  -  /  -  k, )  is  making  its  choice,  it  will 
always  select  an  instantiation  which  is  different  from  those  picked  by  the  first 
I  -  1  ensembles,  further,  a  new  instantiation  arriving  at  the  M-switch  will  he 
checked  to  see  if  it  has  already  been  assigned  to  a  bank  in  the  predicate.  If  so,  the 
activation  will  be  diverted  to  the  bank  already  assigned  to  it."  If  not,  the 
activation  is  assigned  a  new  bank  in  the  predicate,  via  the  next  unused  ensemble 
in  the  M-switch.  Thus,  all  the  instantiations  channeled  to  the  predicate  are 
unique. 

further,  whenever  the  collector  of  the  ith  bank  of  the  predicate  associated 
with  the  M-switch  becomes  active,  the  activation  automatically  gets  transmitted 
to  the  collector  of  the  predicate  bank  which  originated  the  instantiation  selected 
by  the  ith  ensemble.  Also  note  that  becomes  active  only  if  both  f  .Arb  and 

the  collector  of  the  associated  predicate  bank  are  simultaneously  active.  .A  more 
detailed  description  of  the  structure  and  operation  of  the  M-switch  can  be  lound 
in  Mani  and  Shastri  i  1992). 
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S.  Encoding  Rules  and  Facts 

Having  described  the  mechanisms  used  to  encode  the  type  hierarchy  and  imple¬ 
ment  multiple  dynamic  instantiation  ot  predicates,  we  shall  now  describe  the 
encoding  ot‘  rules  and  facts  in  the  extended  system.  Also,  queries  in  the  extended 
system  can  contain  typed  variables;  we  will  discuss  how  the  typed  variables  are 
interpreted  and  how  such  queries  are  answered. 

5.1.  Facts 

Facts  in  the  original  system  t  Shastn  &  Ajjanagadde,  1990'  are  encoded  by  wiring 
up  temporal  pattern  matchers,  as  shown  in  Figure  ha).  The  extensions  we  have 
made  to  the  system  necessitate  the  addition  of  extra  machinery  in  order  to  match 
correctly  dynamic  instantiations  with  long-term  facts. 

('oncepts  and  instances  can  now  accommodate  /f,  instantiations.  A  fact  should, 
therefore,  be  able  to  check  if  any  one  of  these  /t,  instantiations  is  in  the  required 
phase.  This  effect  is  realized  by  treating  the  output  of  concept  clusters  to  be  a 

bundle  of  A),  links.  Figure  10  shows  the  encoding  of  the  fact  P(C, . C„),  where 

the  outputs  of  (' . are  considered  to  be  bundles  of  A’,  =  wires.  Further, 

given  the  capability  of  the  system  to  encode  multiple  dynamic  instances  of  a  predi¬ 
cate,  the  dynamic  instantiation  which  matches  a  long-term  fact  could  occur  in  any 
one  of  the  k.  banks  for  that  predicate.  We  therefore  need  a  faci-pattern-matcher 

for  each  of  the  predicate  banks.  Thus,  any  fact  P(Ci . Cn)  will  be  encoded 

using  k,  T-and  nodes — one  for  each  bank  of  F — as  illustrated  in  F'igure  10. 

As  mentioned  in  Section  3.1,  a  typed,  universally  quantified  variable  is  treated 
as  being  equivalent  to  its  type,  and  vice  versa.  Thus,  the  facts  Vx.Ca,  V-Cb  P(x,y), 
Vx:Ca  P(x,Cb)  and  Vy:C8  P(CA,y)  are  equivalent  to  P(Ca,  Cb).  A  typed,  existen¬ 
tially  quantified  variable  is  encoded  by  creating  a  unique  subconcept  of  the 
associated  type.  Thus,  3x:Ca  P(x,Cb)  is  encoded  as  P(Ca.Cb),  where  (F,  is  a 
unique  subconcept  off,',.  This  interpretation  forces  all  existential  variables  to  be 
outside  the  scope  of  the  universal  variables  in  the  fact.  Further,  any  unspecified 
role  in  a  fact  is  treated  as  being  existentially  quantified.  F'or  example,  Vx:Cat 
prevs-on(x,y)  would  be  interpreted  as  ‘every  cat  preys  on  some  bird’. 

5.2.  Queries 

With  the  introduction  of  the  type  hierarchy,  the  extended  system  can  answer 
queries  with  typed  variables.  Though  the  system  can  deal  with  both  yes -no  and 
WH  queries,  we  shall  concern  ourselves  only  with  yes -no  queries. 

Consider  a  query  P(.  .  . ,  x,  .  .  .)?  where  x  is  a  typed  variable  of  type  C,' ,,  tilling 
the  ith  argument  of  P.  To  pose  the  query  to  the  system,  the  enabler  e:P  of  predicate 
P  is  first  activated.  Depending  on  whether  ,v  is  universally  or  existentially 
quantified,  we  proceed  as  follows; 

•  If  v  is  universally  quantified — i.e.  the  query  is  of  the  form  Vx;Ca 
P{.  .  .  ,  X,  . .  .)? — then  C  I,  and  (>' ,,,  ti.e.  the  ,  and  (>  ,.  nodes  in  the  first  bank 
of  the  entity  cluster  for  )  are  set  to  tire  in  synchrony  with  the  ith  argument 

of  P.  In  order  to  verify  if  P(.  .  . ,  Ca,  .  .  -)  is  true,  wc  need  to  check  if 
P(.  .  .  ,  C,  .  .  .)  is  asserted  in  the  sy.stcm  where  is  either  (i ,  or  an  ancestor  of 
('^.  Activating  and  achieves  exactly  this  goal,  by  virtue  of  the 

activation  control  mechanism  i  .Section  3.2). 
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Fact:  P(C1 . Cn) 


Figure  10.  iincoding  long  term  facts  in  the  extended  reasoning  system.  The  fact 
encoded  is  P(Ci, .  .  . ,  C„).  We  have  assumed  that  =  /f,  =  ?i. 


m  If  x  is  existentially  quantified — i.e.  the  query  is  of  the  form  3x;Ca 
P(. .  .  ,  X,  . .  .)? — then  C.,,  and  C.,,j  arc  set  to  tire  in  synchrony  with  the  ith 
argument  of  P.  This  causes  activation  to  spread  to  the  ancestors,  descendants 
and  the  ancestors  of  the  descendants  of  .  As  stated  in  Section  .3.2,  the 
propagating  activation  searches  for  facts  which  would  render  the  current  query 
true. 

A  more  detailed  justification  of  the  correctness  of  the  above  procedure  can  he 
found  in  Mani  and  Shastri  (1991). 

Just  as  for  facts,  concepts  directly  specified  in  the  query  predicate  are  a 
shorthand  for  universal  typed  variables — i.e.  P(. . . ,  C^,  . . .)?  is  the  same  as 
Vx;Ca  P(.  . . ,  X, . . .)?  Universally  quantified  variables  are  interpreted  to  be  within 
the  scope  of  the  existentially  quantified  variables.  Untyped  variables  arc  unspe¬ 
cified  roles,  and  hence  will  not  be  assigned  a  phase  when  communicating  the 
query  to  the  network. 
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.vJ.  Rules 

Rule  encoding  in  the  basic  system  is  illustrated  in  1‘igure  1  ai,  and  a  detailed 
account  can  be  found  in  Shastri  and  Ajjanagadde  ;  1990'.  Here  we  shall  consider 
the  capabilities  and  complications  introduced  by  the  extended  system. 

The  type  hierarchy  can  be  used  to  impose  type  restrictions  on  variables 
occurring  in  rules,  for  both  forward  and  backward  reasoning.  To  utilize  this 
feature,  we  need  to  modify  the  encoding  of  rules.  In  a  forward  reasoning  system, 
a  rule  is  encoded  by  introducing  a  r-or  node  to  perform  type  checking  for  the 
argument  in  question,  l-igure  lUai  shows  the  encoding  of  the  rule  Vx:T, , 
y;T2  BzTj  P,  (x,y)  a  Pjfx.z)  =»  Q(y).  Here.  Ci.  ft:  itnd  g,  are  r-or  nodes  for  type 
checking  which  turn  ON  only  if  the  corresponding  predicate  arguments  are 
bound  to  objects  of  the  right  type.  1‘or  e.xample.  g,  would  go  ON  only  if  the 
second  argument  of  I\  and  at  least  one  of  the  instantiations  of  7’,  are  in 
synchrony — which  is  to  say  that  the  argument  is  bound  to  an  I'bject  of  type  /',. 
As  indicated  in  the  figure,  links  from  7',,  /'.  and  /’,  are  bundles  of  Ar,  wires  each. 
It  is  also  evident  from  l-igure  IKai  that  the  rule  will  not  tire — and  predicate  <J 
will  not  go  active — unless  all  the  g-nodes  ig, ,  i»rid  g,  '  are  active.  In  a  backward 
rcasoner,  the  strategy  is  similar,  except  for  two  basic  differences,  l-'irst.  type- 
checking  for  a  typed  universally  quantified  variable  is  enforced  by  a  bundle  of  At, 
inhibitory  links  from  the  concept  i  see  l-'igure  1  li  bji  representing  the  type  of  the 
concerned  argument.  Second,  for  a  typed,  existentially  quantified  variable,  the 
inhibitory  links  for  type  enforcement  are  derived  from  a  unique  subconcept  of 
the  associated  type.''  The  network  which  implements  the  rule  Vx:T,  ByTj 
P(x)  =*•  Q(x,y)  for  backward  rea.soning  is  sketched  in  l-igure  11(b).  Any  type 
mismatch  causes  g,  to  block  further  propagation  of  activation.  Thus,  in  both  the 
forward  and  backward  reasoners,  the  encoding  mechanism  ensures  that  a  rule 
fires  only  if  all  typed  arguments  are  tiring  in  synchrony  with  their  respective 
types. 

When  multiple  instantiation  of  predicates  is  introduced,  the  rule  connectivity 
indicated  above  will  need  to  be  replicated  k,  times  and  rule  wiring  will  need  to  be 
routed  through  the  M-switch.  l-'igure  K  illustrates  rule  encoding  at  a  very  gross 
level,  l-'igure  12  gives  a  more  detailed  description  of  rule  encoding  in  the  -extended 
system,  l-'igure  12  depicts  the  encoding  of  the  rule  Vx,y  P(x,y)  =>  Q(y,x). 

liach  bank  of  predicate  Q  is  connected  to  an  input  bank  in  every  ensemble  of 
the  switch  for  P.  Consider  the  connection  from  the  dh  bank  of  Q  to  the 
corresponding  input  bank  in  the y  th  ensemble  of  the  switch  for  /’.  The  input  cable 
from  bank  /  of  Q  connects  to  the  input  bank  as  though  the  input  bank  itself 
represented  the  predicate  P.  Thus,  the  connection  pattern  between  the  bank  of  Q 
and  the  input  bank  is  identical  to  the  connection  pattern  between  the  actual 
predicates  in  the  system  of  Shastri  and  Ajjanagadde  (  1990).  In  particular,  for  the 
example  in  l-igure  12,  we  have  the  following  connections  for  1  -ii  and 
1  ^  ky 

•  The  enabler  c:Q,  of  the  fth  bank  of  Q  is  connected  to  the  enabler  in  ihe 
corresponding  input  bank  of  the  yth  ensemble  of  the  switch  for  P. 

m  The  collector  in  the  same  input  bank  is  linked  to  e:Q,,  the  collector  of  the  ith 
bank  of  predicate 

•  The  first  argument  of  the  fth  bank  of  Q  connects  to  the  second  argument  in  the 
input  bank  while  the  second  argument  of  the  predicate  bank  connects  to  the 
first  argument  of  the  input  bank. 


Figure  11.  (a)  Network  encoding  the  rule  Vx:!,,  yiTj  32:T3  Pi(x,y)  a  P2(x,z)  => 
Q(y)  in  a  forward  reasoning  system,  a'h  an  and  are  r-or  nodes  which  perform 
type  checking,  apart  from  enforcing  other  constraints;  0  represents  the  ni>de 
threshold,  b)  Network  encoding  the  rule  Vx:T,  ByiT^  P(x)  =>  Q(x,y)  in  a  back¬ 
ward  reasoning  system.  T”,  is  a  unique  subconcept  of  7’,;  a'i  is  a  r-or  node.  VC'e 

have  assumed  that  k.  = 
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Figure  12.  I'-ncoding  rules  in  the  extended  reasoner  with  multiple  instantiation 
of  predicates.  The  rule  encoded  is  Vx.y  P(x,y)  =>  Q(y,x).  Nodes  marked  with  a  ‘2’ 
have  a  threshold  0  =  2.  To  avoid  cluttering,  only  part  of  the  connections  arc 

indicated. 


Since  the  /th  bank  of  Q  is  connected  to  an  input  bank  in  every  ensemble  i>f  the 
switch,  the  collector  c:Q,  of  this  bank  receives  inputs  from  the  respective  collec¬ 
tors  in  the  input  banks  of  all  the  ensembles  of  the  switch.  The  r-or  unit  associated 
with  the  input  bank  ensures  that  the  collector  c:Q,  is  activated  if  and  only  if  the 
instantiation  received  by  the  input  bank  has  been  channeled  to  the  predicate,  and 
the  collector  in  the  corresponding  bank  of  predicate  P  is  active  refer  to  h'igure 
9(b)).  In  other  words,  the  predicate  collector  c.y,  would  be  activated  if  the 
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activation  ot  (J,  has  been  channeled  lo  /’,  and  the  collector  c .7’.  is  active.  I  hc 
combination  ot  collectors  in  the  arbitrator  and  the  input  bank  therelore  serve  as 
a  mechanism  tor  transmitting  the  state  ot  the  collector  c.7’.  to  the  collector  t  <  \  ol 
predicate  Q. 

Rules  with  multiple  predicates  in  the  antecedent  are  handled  by  an  extension 
ot'  the  above  procedure,  l-igure  13  gives  a  network  which  encodes  the  rule  Vx.y.z 
P(x,y)  A  Q(y,z)  =>  R(x,y,z).  The  .i'3,  nodes"  check  that  the  dynamic  activation 
of  the  tth  bank  ot  R  follows  from  the  tacts  tor  both  /’  and  Q.  Note  that  e3, 


@© 

bank  1 

Qli  ©0 

bank  2 

0(2)  ©0 
^  Predicate  P 


Oc2)  0® 

bank  l 

Q(2i  ©  © 

bank  2 

©li)  ©  © 

Predicate  Q 


Figure  13.  Imcoding  rules  with  multiple  predicates  in  the  antecedent.  The  rule 
encoded  is  7x,y,z  P(x,y)  a  Q(y,z)  =>  R(x,y.z).  The  nodes  marked  Ci  are  r-and 
nodes;  0  represents  the  node  threshold.  Nodes  marked  with  a  2'  have  a  threshold 
0  -  2.  I'o  avoid  cluttering,  only  relevant  connections  are  indicated. 
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will  become  active  irrespective  of  which  banks  of  /’  and  (J  contain  the  activation 
which  triggered  the  required  facts. 

The  encoding  of  a  rule  with  repeated  variables  in  ihe  con.sequenu  e.xisiential 
variables  in  the  consequent  and  constants"  in  the  consequent  is  shown  in  figure 
14.  The  output  of  the  t;  1  nodes  inhibits  the  links  from  O  to  the  predicatei  s)  in  the 
antecedent  of  the  rule.  Note  that  the  scheme  is  identical  to  the  one  used  to  handle 
such  conditions  in  Shastri  and  Ajjanagadde  i  19901,  except  that  we  repeat  the 
scheme  for  each  of  the  k.  banks,  further,  the  inhibitory  links  from  entities  would 
be  bundles  of  links. 


Showing  Predicate  O  in  a  rule  of  the  form: 

Vx  (  ANTECEDENT  =>  3y  Q(x.x.y.A)  )  to  switch 


input  bank 


bank  3 


Predicate  Q 

Figure  14.  fncoding  rules  with  special  conditions — repeated  variables,  existen¬ 
tially  quantified  variables  and  constants  in  the  consequent  —  in  the  backward 
reasoner.  The  rule  encoded  is  Vx  antecedent  =>  3y  Q(x,x,y,A).  The  antecedent  of 
the  rule  has  been  left  unspecified  since  the  mechanisms  used  to  handle  the  special 
conditions  arc  confined  to  modifying  the  activation  from  the  consequent  predi¬ 
cate.  The  jt’i  nodes  arc  r-or  nodes.  The  g,  nodes  are  like  r-or  nodes  except  that 
they  become  active  if  they  receive  input  in  more  than  one  phase  within  a  period 

of  oscillation. 
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>.4.  l^mtph’xity  of  the  Scttcork 

The  extended  reasoning  system  requires  C  ,  A’,  Y.  +  k,  /  ki  ■  nodes,  where 
is  the  total  number  of  entities  in  the  system,  .  f  is  the  total  number  of  long-term 
facts  present  in  the  reasoner,  and  .>*  is  the  sum  of  the  arities  of  all  predicates  in 
the  rule  base.  A’,  and  A>.  are  the  multiple  instantiation  constants  for  the  type 
hierarchy  and  the  rule  base,  respectively.  As  in  Shastri  and  Ajjanagadde  (.1990), 
the  network  complexity  is  at  most  linear  in  the  size  of  the  knowledge  base. 

As  for  time  complexity,  the  system  can  answer  queries  in  time  proportional  to 
the  length  of  the  shortest  derivation,  irrespective  of  the  number  of  rules  and  facts 
encoded  in  the  system.  Compared  with  the  original  system,  the  constant  of 
proportionality  is  now  slightly  larger,  since  we  also  need  to  consider  the  time 
required  for  activation  to  propagate  through  the  switches.  Given  a  predicate 
the  best  case  propagation  time  for  activation  passing  through  its  M-switch  is 
proportional  to  «,  the  arity  of  P\  in  the  worst  case,  propagation  time  is  propor¬ 
tional  to  A,  //.  If  we  assume  that  is  the  maximum  arity  of  any  predicate  in 
the  reasoning  system,  then  the  constant  of  proportionality  for  the  time  complexity 
will  be  proportional  to  vin  the  best  case)  or  k,  in  the  worst  case), 

irrespective  of  the  predicate  under  consideration.  The  time  taken  for  activation  to 
traverse  the  type  hierarchy  is  independent  of  k^ .  and  is  only  dependent  on  the 
number  of  is-a  links  that  need  to  be  traversed  in  order  to  answer  the  query.  The 
time  taken  to  answer  a  query  will  be  proportional  to  the  maximum  of  the  time 
taken  for  activation  to  spread  in  (i)  the  rule  base  and  (iii  in  the  type  hierarchy. 

5.5.  Multiple  Imtantiatum  in  a  Fortvard  Rcasonini;  System 

All  along,  we  have  looked  at  how  the  basic  reasoning  system  could  be  extended  to 
accommodate  multiple  instantiations  of  predicates  in  the  backward  reasoner.  We 
now  consider  issues  that  arise  when  incorporating  multiple  instantiation  of 
predicates  in  the  forward  rea.soner. 

In  a  forward  reasoning  system,  predicates  have  the  same  structure  as  in  the 
backward  reasoning  system.  As  before,  every  predicate  has  an  associated  multiple 
instantiation  M-switch.''  Rules  with  a  single  predicate  in  the  antecedent  can  be 
encoded  directly:  each  bank  of  the  antecedent  predicate  is  connected  to  input 
banks  in  every  ensemble  of  the  M-switch  for  the  consequent  predicate.  Rules  with 
multiple  predicates  in  the  antedecent,  however,  require  special  consideration. 
,Suppo.se  we  have  a  rule  of  the  form  Vx,y,z  P(x,y)  a  Q{y,z)  =>  R(x,y,z).  Suppose 
also  that  we  are  given  the  dynamic  facts  P(A,B)  and  Q(B,C).  Then  we  should  he 
able  to  conclude  R(A,B,C).  But  the  dynamic  fact  P(A,B)  could  be  represented  in 
any  of  the  k,  banks  allocated  for  P.  Similarly  Q(B,C)  could  be  active  in  any  of  the 
A,  banks  allocated  for  Q.  To  conclude  R(A,B,C),  we  would  need  to  pair  each  bank 
of  P  with  all  the  banks  of  Q  and  check  if  the  second  argument  of  P  is  the  same 
as  the  first  argument  of  Qi  in  other  words,  we  need  ti'  check  if  the  second 
argument  of  P,  is  the  same  as  the  lirst  argument  of  Q,  for  1  -  i,j  -  k,.  The 
obvious  solution  to  this  problem  requires  (hk'")  nodes  and  links  to  encode  each 
multiple  antecedent  rule,  where  m  is  the  number  of  predicates  in  the  antecedent 
of  the  rule  and  A,  is  the  multiple  instantiation  constant  for  the  rule  ba.se. 
rypically,  we  expect  the  value  of  k.  to  be  around  .A  as  argued  in  Shastri  and 
Ajjanagadde,  1993a),  and  m  to  be  around  2.  Generally,  in  a  rule  containing  an 
antecedent  with  several  predicates,  most  of  the  antecedent  predicates  function  to 
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specify  constraints  on  the  arguments  of  one  or  two  key  predicates.  Since  the 
reasoning  system  can  handle  rules  with  typed  variables,  most  of  the  predicates 
enforcing  type  constraints  can  be  replaced  by  typed  variables,  b'or  example,  the 
rule  Vx,Y  collide(x,y)  a  animate(x)  a  solid-obj(y)  =>  hurt(x)  with  three  predi¬ 
cates  in  the  antecedent  is  equivalent  to  the  simple  rule  Vx.animate,  y;solid-obj 
collide(x.y)  =>  hurt{x).  The  latter  rule  can  be  directly  encoded  in  the  extended 
reasoning  system,  liven  if  this  compression'  of  the  antecedent  were  not  possible, 
we  could  always  introduce  dummy  predicates  and  split  a  rule  with  several 
predicates  in  the  antecedent  into  several  rules  with  just  a  few  predicates  in  the 
antecedent.  Thus,  with  typical  values  of  k,  ^  3  and  m  ^  2,  the  extra  cost  of 
encoding  rules  in  the  forward  reasoner  with  multiple  instantiation  is  a  factor  of 
about  10  (  5:3-'). 

Special  conditions  in  a  rule  i  like  repeated  variables,  existential  variables  in  the 
antecedent,  constants  in  the  antecedent,  etc.)  can  be  handled  as  usual,  before 
connecting  a  predicate  bank  to  the  input  banks  in  the  .\l-switch. 

Incorporating  multiple  instantiation  in  the  forward  reasoner  gives  us  the 
capability  to  encode  rules  like  Vx,y,z  loves(x,y)  a  loves(y.z)  =>  jealous(x,z),  and 
infer  jealous(John,Tom)  given  loves(John,Mary)  and  loves{ Mary, Tom). 

5.5. /.  h'orward  and  backzvard  rcasomng.  T'he  structure  ot  the  M-switch  used  in 
both  the  forward  and  backward  reasoners  is  identical.  The  dilferenccs  in  connec¬ 
tivity  in  the  forward  and  backward  reasoners  arise  from  the  differences  in  rule 
encoding.  Further,  handling  special  conditions  in  rules — like  repeated  variables, 
constants,  existential  variables,  etc. — are  dealt  with  differently  in  the  forward  and 
backward  reasoners.  Despite  the  similarities  in  the  .\l-switch  structure,  the 
incompatible  differences  in  the  network  structure  of  the  forward  and  backward 
reasoners  would  make  it  dilficult  to  use  the  same  network  lor  both  forward  and 
backward  reasoning. 

Hy  carefully  dehning  the  interface  between  ihe  forward  and  backward  reason- 
ers,  we  could  have  a  system  with  both  the  forward  and  backward  reasoners 
functioning  independently  and  at  the  same  time  exchanging  inferences  and 
predicts,  thereby  complementing  each  other.  Work  is  being  done  on  developing 
such  a  system. 

5.6.  C'onstraints 

The  original  system  described  in  Shastri  and  Ajjanagadde  (  1990)  and  the  ex¬ 
tended  system  described  here  are  tractable,  limited  inference  systems  >  Shastri, 
1993a).  T'hough  the  system  can  handle  a  large  class  of  rules  and  facts,  there  are 
some  constraints  on  the  form  of  rules  and  facts.  A  brief  description  of  these 
constraints  is  provided  here.  .A  more  detailed  description  along  with  psychological 
implications  of  these  constraints  can  be  found  in  Shastri  and  Ajjanagadde  1993a) 
and  Shastri  (  1992). 

In  a  backward  reasoning  system,  where  activation  flows  from  the  consequent 
predicate  to  the  antecedent  predicate;  s ),  any  predicate  argument  in  the  antecedent 
that  requires  some  condition  to  be  enforced  must  occur  in  the  consequent  and  get 
bound  during  a  given  episode  of  reasoning.  Thus,  typed  variables,  repeated 
variables,  existential  variables  and  constants  which  occur  in  the  antecedent  of  a 
rule  must  occur  in  the  consequent  and  get  bound  during  any  episode  of  reasoning. 
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I'or  example,  the  rule  Vx.y.z  loves(x,y)  a  loves{y,z)  =>  iealous(x,z).  cannot  be 
encoded  in  the  backward  reasoner  since  the  antecedent  has  a  repeated  variable  v 
which  does  not  occur  in  the  consequent.  In  (act.  it  has  recently  been  shown 
iDeitz  Cl  al.,  1993)  that  the  repeated  variable  constraint  —  repeated  variables  in 
the  antecedent  must  occur  in  the  consequent  and  net  bound  during  any  episode 
of  reasoning — is  essential  in  order  to  draw  inferences  in  time  independent  o(  the 
size  of  the  knowledge  base. 

Constraints  similar  to  the  above  hold  for  the  lorward  reasoner;  typed  vari¬ 
ables,  repeated  variables,  existential  variables  and  constants  which  occur  in  the 
consequent  must  occur  in  the  antecedent  and  get  bound  during  any  episode  o( 
reasoning.  Thus,  the  rule  Vx,y,2  St,  ,(2  move(x,y,z)  =>  present(x,y,t 
present(x,z,t2)  cannot  be  used  for  forward  reasoning  since  the  existential 
ables  r,  and  tj  do  not  occur  in  the  antecedent. 

While  representing  facts  and  posing  queries  which  involve  typed  variable, s. 
only  those  situations  where  all  the  universally  quantilicd  typed  variables  are 
within  the  scope  of  the  existential  typed  variables  can  be  represented,  further, 
concepts  and  predicates  can  only  represent  a  limited  number  of  dynamic  instanti¬ 
ations. 

The  reasoning  system  al.so  introduces  the  eonstraini  that  imly  a  small  number 
of  entities  ean  be  simultaneously  active  at  any  given  time.  This  may  not  be 
restrictive  for  any  given  episode  of  reasoning,  but  can  be  limiting  when  complex, 
interlinked  chains  of  reasoning  are  required.  In  such  eases,  the  set  of  entities  in 
■focus’  must  keep  changing  dynamically,  recycling  the  available  phases  1  see 
Section  5.7). 

5.7.  Significance  of  the  ('omtraints 

The  reasoning  system  is  psychologically  and  cognitively  plausible  in  that  it 
provides  vcrihable  predictions  and  pointers  which  turther  our  understanding  of 
rcHexive  reasoning  '.  Shastri,  1992;  Shastri  &  Ajjanagadde,  199,^a  .  The  constraints 
listed  in  Section  5.6  predict  what  kinds  o(  rules  may  participate  in  rcHexive 
reasoning  and  what  kind  of  rules  will  need  'rcHcctive'  reasoning.  An  example  to 
wit  would  be  the  rule  ‘if  .v  loves  y  and  y  loves  c,  then  x  is  jealous  of  r'.  Suppose 
we  arc  told  that  'John  loves  Mary'  and  ‘Mary  loves  Tom'  then  we  can  rcHexively 
infer  that  ‘John  is  jealous  of  Tom'.  Hut  if  we  assume  that  the  facts  ‘John  loves 
Mary’  and  ‘Mary  loves  I'om’  are  encoded  as  long  term  knowledge,  we  will  find 
it  difficult  to  answer  the  query  Is  John  jealous  of  romr’  in  a  reflexive  manner. 
Observe  that  the  rule  ‘if  x  loves  y  and  y  loves  c,  then  .v  is  jealous  of  c’  can  be  used 
reHcxively  in  the  forward  reasoner  but  not  in  the  backward  reasoner  since  the 
antecedent  contains  a  repeated  variable  y  which  does  not  occur  in  the  consequent 
( Section  5.6). 

Another  prediction  introduced  by  the  system  is  based  on  the  multiple  instan¬ 
tiation  constraint.  A  predicate  cannot  be  instantiated  more  than  a  (small)  fixed 
number  of  times.  Thus,  we  would  expect  to  have  difficulty  dealing  with  too  many 
instantiations  at  once.  I'or  example,  we  would  normally  find  it  difficult  to  answer 
questions  about  who  loves  whom  reHcxively  after  having  been  told  without 
repetitions)  that;  Susan  loves  Tom,  John  loves  Lisa.  Tom  loves  Mary  and  Llara 
loves  Tom. 

One  of  the  constraints  stated  in  Section  5.6  was  that  only  a  small  number  of 
distinct  entities  (seven  to  tent  can  be  simultaneously  active  in  the  svstem,  at  anv 
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given  time.  When  taken  together  with  other  memory  mechanisms,  we  argue  that 
this  is  not  as  restrictive  as  it  sounds.  As  discussed  in  Shastri  and  Ajjanagadde 
^  1993a),  when  involved  in  tasks  which  require  keeping  track  oi  a  large  number  i>l 
entities  over  reasonable  time  spans — like  reading  a  novel  or  participating  in  a 
conversation — those  dynamic  facts  that  arc  relevant  could  enter  a  medium-term 
memory  where  they  would  be  available  for  a  much  longer  time.  Some  of  these 
facts  might  even  be  converted  to  long-term  facts,  which  persist  for  a  long  time. 
There  could  arise  situations — which  require  complex,  interlinked  chains  of 
reasoning — where  the  total  number  of  entities  could  exceed  the  limit  imposed  by 
our  system.  In  such  cases,  the  set  of  entities  in  "focus'  must  keep  changing 
dynamically,  recycling  the  available  phases.  Identifying  mechanisms  that  underlie 
such  internal  ‘shifts  of  attention'  and  cause  the  system's  activity  to  evolve 
smoothly  remains  a  challenging  open  problem. 

These  constraints  also  have  implications  for  other  'rcHexive'  processing  phe¬ 
nomena  besides  reasoning.  Henderson's  (  1993)  work  on  parsing  shows  that  the 
above  constraints  help  in  explaining  some  of  the  limitations  ot  human  parsing  by 
modeling  several  linguistic  phenomena  involving  long  distance  dependencies, 
garden  path  effects  and  our  limited  ability  to  deal  with  center-embedding. 

5.8.  Simulations 

The  reasoning  system  has  been  tested  using  a  simulator  (Mani,  1991)  developed 
to  run  on  the  Rochester  Connectionist  Simulator  (Goddard  ci  al.^  1989\  The 
simulator  runs  as  a  ‘shell’  on  top  of  the  Rochester  simulator.  It  provides  an  input 
language  for  entering  rules,  facts  and  queries.  A  network  encoding  the  input 
knowledge  is  automatically  built.  The  simulator  can  construct  stand-alone  for¬ 
ward  or  backward  reasoning  systems,  or  a  combined  resoning  system  with  the 
forward  and  backward  reasoners  forming  independent  layers.  The  simulation  can 
be  run  interactively  and  the  progress  of  the  simulation  monitored  using  graphic- 
displays. 

A  knowledge  base  containing  about  100  rules,  25  facts  and  50  is-a  facts  has 
been  simulated.  The  resulting  network  requires  about  7100  nodes  for  the  back¬ 
ward  reasoner,  about  8700  nodes  for  the  forward  reasoner  and  about  1 200  nodes 
for  encoding  the  type  hierarchy.  The  node  count  for  the  backward  reasoner 
includes  nodes  u.scd  to  encode  facts.  The  type  hierarchy  contains  a  total  of  about 
60  concepts  and  instances.  Rased  on  these  simulations,  it  can  be  argued  that  the 
reasoning  system  can  draw  a  class  of  inferences  in  about  a  few  hundred  millisec¬ 
onds.  In  a  system  made  up  of  slow  ‘neurons’,  with  a  firing  period  n  of  about 
20  ms  and  with  the  assumption  that  p-btu  nodes  can  synchronize  within  two 
firing  periods,  we  can  arrive  at  the  following  timing  estimates:  the  system  takes 
about  260  ms  to  infer  that  ‘John  is  Mary’s  spouse’  given  ‘Mary  is  John's  spouse'. 
Given  that  ‘John  bought  a  novel’,  the  system  takes  about  320  ms  to  conclude  that 
■John  owns  a  book’.  With  ‘John  bought  a  novel’  encoded  as  a  long-term  fact,  the 
system  can  answer  ‘yes’  to  the  queries  ‘Did  John  buy  a  novel?'  and  ‘Does  John 
own  a  book?’  in  140  ms  and  420  ms,  respectively. 

Prototype  implementations  of  the  reasoning  system  have  been  developed  on 
the  Connection  Machines  CM-2  and  CM-5,  and  initial  results  have  been  very 
encouraging.  The  prototype  system  can  encode  knowledge  bases  containing  over 
a  hundred  thousand  rules  and  facts,  and  can  answer  queries  requiring  an 
inference  depth  of  up  to  ten  in  times  ranging  from  a  few  milliseconds  to  a  few 
hundred  milliseconds  on  the  CM-5. 
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6.  Related  Work 

Though  there  have  been  several  attempts  to  develop  connectionist  rule-based 
reasoning  systems,  very  tew  td  the  approaches  ci>ncern  themselves  with  reHexive 
reasoning.  A  detailed  discussion  I'T  the  advantages  and  disadvantages  of  several 
other  approches — both  ci>nnectiomst  and  otherwise — can  be  found  in  Shastri  and 
Ajjanagadde  1 1993a).  Here,  we  shall  limit  the  discussion  to  type  hierarchy 
implementations,  multiple  dynamic  instantiation  of  predicates,  and  the  interaction 
between  the  type  hierarchy  and  the  rule  base. 

Early  work  like  I'ahlman  i  19"9)  and  Shastri  ( 1988)  considered  crticient  and 
massively  parallel  implementation  of  is-a  hierarchies.  But  these  systems  did  not 
deal  with  the  explicit  representation  of  rules  involving  «-ary  predicates. 

DCPS,  a  distributed  connectionist  production  system  introduced  by  Tou- 
retzky  and  Hinton  tl988>,  was  limited  in  that  it  could  only  deal  with  single 
variable  rules,  and  like  a  classical  production  system,  could  only  tire  one  rule  at 
a  time.  TPFS,  described  in  Dolan  and  Smolensky  ^  1989),  is  a  production  system 
which  uses  tensor  products  to  encode  dynamic  bindings.  Like  DCPS,  I'PPS  is 
also  serial  at  the  knowledge  level  in  that  it  can  only  tire  rules  serially.  The 
restrictive  nature  of  these  systems — both  in  terms  of  expressive  power  and  in 
terms  of  effective  use  of  parallelism — renders  them  unsuitable  for  successfully 
modeling  reHexive  reasoning.  DCPS  and  TPPS  use  a  distributed  encoding  in  that 
arguments  and  tillers  are  represented  as  patterns  of  activation  over  groups  of 
nodes.  These  systems,  therefore,  inherit  the  advantages  and  disadvantages  of 
distributed  connectionist  systems.  The  advantages  of  distributed  representations 
stem  from  their  ability  to  capture  similarity.  The  seriality  i,at  the  knowledge  level) 
imposed  by  representing  roles  and  fillers  as  patterns  of  activity  over  a  common 
pool  of  nodes,  however,  constitutes  the  major  disadvantage  of  using  distributed 
representations. 

The  key  to  capturing  similarity  in  a  distributed  repre.sentation  is  the  use  of 
shared  representation.  .\s  stated  in  Shastri  and  Ajjanagadde  1993b).  the  type 
hierarchy  in  the  reasoning  system  also  leads  to  such  a  sharing  ol  representation  by 
viewing  the  encoding  of  an  entity  as  a  distributed  pattern  over  the  collection  of 
nodes  that  make  up  the  type  hierarchy.  If  one  augments  the  representation  of 
types  with  attribute  values  (Shastri  &  Eeldman,  1986;  Shastri.  1988).  then  the 
‘distributed’  nature  of  the  representation  of  each  entity  becomes  even  mtire 
apparent.  I'or  predicate  arguments,  the  reasoning  system  uses  a  representation 
where  each  role  is  localized  in  the  abstract  representation.'”  It  is  this  abstract 
localization  of  roles  that  enables  the  system  to  ti)  overcome  the  inherent  seriality 
of  distributed  representations  and  (ii)  support  knowledge-level  parallelism. 

CONPOSIT,  a  system  introduced  by  Barnden  and  Srinivas  1991)  uses 
relative  position  encoding  and  pattern-similarity  association  to  solve  the  variable- 
binding  problem.  Issues  that  we  have  considered  in  this  paper,  like  encoding  the 
type  hierarchy,  dealing  with  typed  variables,  etc.  are  not  considered  in  Barnden 
and  Srinivas’s  paper  !  l99I).  Rather  than  reHexive  reasoning,  LONPOSIT  is 
tailored  to  handle  a  more  complex  class  of  rules  and  seems  to  be  better  suited  for 
complex  reflective  reasoning — reasoning  which  requires  reflection,  conscious 
thought  and  deliberation.  CONPOSIT  is  also  serial  at  the  knowledge  level. 

The  connectionist  reasoning  system  most  similar  in  approach  to  the  one 
proposed  here  is  ROBIN  (  Lange  &  Dyer,  1989).  ROBIN  was  developed  to 
address  ambiguity  resolution  in  language  understanding  using  evidential  knowl¬ 
edge.  In  recent  work,  ROBIN  has  been  extended  to  deal  with  case-based 
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reasoning  in  combination  with  rule-based  reasoning  Wharton  it  al..  1992  .  I'nlike 
the  temporal  synchrony  approach,  ROHIN  can  deal  with  an  arbitrary  number  o! 
entities  during  reasoning.  ROBIN  can  do  so  because  it  permanently  alKteates  a 
unique  signature  to  each  concept  in  the  system,  and  sustains  dynamic  bindings  by 
propagating  these  signatures.  .As  discussed  in  .Shastri  and  Ajjanagadde  199.1a). 
there  arc  certain  drawbacks  to  propagating  signatures  in  a  conncetionist  network. 
Furthermore,  the  use  ol  signatures  does  mu  lead  to  the  type  *>1  psychological 
predicts  than  can  be  made  using  the  temporal  synchrony  approach. 

CONSYDI-RR  is  a  connectionist  rule-based  reasoning  system  which  uses  a 
two-level  architecture  (Sun,  1991).  One  level  uses  a  distributed  representation 
while  the  other  uses  a  localist  representation.  The  two  levels  interact  to  provide 
a  robust  system  that  can  handle  partial,  fuzzy  and  uncertain  information.  Though 
this  system  has  an  implicit  type  hierarchy  built  into  the  architecture,  it  does  not 
consider  how  multiple  dynamic  instances  of  a  predicate  can  be  represented. 

7.  Conclusion 

Adding  a  type  hierarchy  allows  the  reasoning  system  to  represent  is-a  relation¬ 
ships  elliciently  and  supports  the  occurrence  of  types  as  well  as  instances  in  rules, 
facts  and  queries.  Being  able  to  represent  multiple  dynamic  instances  of  a 
predicate  adds  the  capability  to  draw  inferences  using  rules  that  capture  symme¬ 
try,  transitivity  and  recursion,  provided  the  number  of  multiple  instantiations 
required  to  draw  a  conclusion  remains  bounded.  The  e.xtended  reasoning  system 
can  therefore  draw  a  much  wider  range  of  inferences.  I'hough  the  resulting 
reasoner  is  relatively  complex  compared  to  the  original  system,  the  increased 
inferential  power  seems  worth  the  added  complexity,  especially  since  we  do  not 
lose  much  in  terms  of  etiiciency. 

The  reasoner  can  perform  high-level  reasoning  while  still  utilizing  the  massive 
parallelism  inherent  in  connectionist  systems.  The  resulting  system  can  draw 
inferences  in  time  which  arc  independent  of  the  size  of  the  knowledge  base.  The 
system  is  also  scalable  in  that  very  large  knowledge  bases  can  be  handled 
tractably.  Work  is  being  done  on  mapping  the  system  on  to  massively  parallel 
SIMl)  and  MIMI)  machines  with  the  objective  of  attaining  real-time  perlor- 
mance  (Mani,  1993). 

Support  for  the  neural  plausibility  of  the  system  is  provided  by  the  fact  that 
it  is  based  on  temporal  synchrony.  Recent  neurophysiological  idray  ci  ul..  1991 ' 
data  suggest  that  temporal  synchrony  may  be  used  in  the  animal  brain  to 
represent  dynamic  bindings.  A  more  detailed  discussion  of  the  cognitive,  biologi¬ 
cal  and  psychological  aspects  of  the  reasoning  system  can  be  found  in  Shastri  and 
Ajjanagadde  (  1993a). 

This  paper  does  not  discuss  the  issue  of  learning.  There  has,  however,  been 
some  preliminary  work  describing  how  such  a  system  might  convert  dynamic 
bindings  into  medium-term  facts  .  Geib,  1990).  Developing  elective  learning 
algorithms  is  an  exciting  future  research  direction  and  an  outline  of  a  learning 
scheme  for  such  a  system  is  discussed  in  Shastri  '  1993b).  Si>me  of  the  other 
extensions  being  investigated  include;  dealing  with  facts  and  queries  involving 
more  flexible  interleaving  of  quantifiers;  expanding  the  system  to  allow  property- 
value  attachments  to  concepts  in  the  type  hierarchy;  combining  the  forward  and 
backward  resoners  to  function  as  an  integral  reasoning  system;  and  introducing 
evidential-preference  rules,  especially  via  learning. 
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Notes 

I,  There  have  been  strnie  notable  exceptions;  see  .Section  h. 

J.  Older-than( John's-father.John)  follows  trom  the  knowledge  that  lathers  arc  older  than  their 
children, 

V  The  solution  to  ihc  multiple  instantiation  problem  proposed  m  this  paper  is  distinct  from  that 
outlined  in  Shastri  and  Aiianaitadde  <  19*)!)',  where  iwo  levels  of  temporal  svnchronv  arc  used  to 
deal  with  multiple  instantiations  of  predicates. 

4.  In  the  rest  of  this  paper,  we  assume  that  synchronous  activiiv  of  nodes  is  oscillalorv.  .\s  explained 
in  Shastri  and  Ajjanattadde  >  199.18!.  however,  oscillatory  activity  is  not  required  for  paper 
functioning  of  the  model — the  crucial  requirement  is  ihai  appropriate  nodes  synchronize.  \X'e 
have  assumed  oscillatory  activity  for  convenience. 

5.  This  is  in  keepinR  with  the  natural  default  meanins  associated  wiih  statements  like  'cats  prey  on 
birds’,  which  generally  means  'all  cats  prey  on  all  birds'. 

6.  This  corresponds  to  the  use  of  a  skolem  constant. 

This  applies  to  a  predicate  in  the  backward  reasoning  system.  In  a  forward  reasoner.  the  collector. 

c:/’.  and  the  arguments.  . have  a  threshold  0  =  2.  These  nodes  require  a  threshold 

of  tf  -  2  in  order  to  u.se  a  latch  enable  link  to  signal  when  the  bank  should  gi>  active  Tigure  Hi. 

S.  Distinct  from  the  T-switch  used  in  the  type  hierarchy 

9.  The  enabler  node  c  .  lrfi  plavs  the  role  ol  the  r-or  node  lor  ihc  first  argument  in  the  arbitrator. 
Thus,  if  we  have  a  unary  predicate,  the  latch  enable  link  will  originate  Irom  c  .  Ir^. 

10.  If  the  node  is  receiving  inputs  in  several  phases,  it  picks  one  arbitrarilv.  In  a  simulation  system, 
this  could  involve  selecting  a  phase  at  random,  selecting  the  lirst  phase,  selecting  the  last  phase, 
and  so  on.  The  simulation  system  we  use  i  Section  r'.Si  selects  the  tirsi  phase.  In  a  physical  system, 
however,  the  phase  in  which  a  node  tires  will  depend  on  complex  interactions  between  the  relevant 
nodes  and  this  interaction  may  even  be  chaotic. 

11.  Suppose  the  new  instantiation  arriving  has  already  been  assigned  to  bank  /.  In  such  a  case,  the 
inhibition  on  the  corresponding  input  bank  in  ensemble  /  will  be  removed  when  the  instantiation 
arrives.  The  input  bank  will  become  active  and  will  blot  out  this  instantiation  Irom  ensembles 
j  +  I  . .  .  k,,  thereby  automatically  assigning  this  instaniiaiion  to  bank  j. 

12.  This  IS  similar  to  the  manner  in  which  typed  existential  variables  in  a  fact  are  interpreted. 

1.1.  gl,  refers  to  the  g.1  mnle  for  the  iih  bank  of  the  predicate. 

14.  (Ainsiants  denote  entities  in  the  domain. 

15.  Though  the  multiple  instantiation  .\l-switch  associated  with  a  predicate  in  the  forward  reasoning 
system  is  structurally  identical  to  the  .\l-switch  u.sed  in  the  backward  reasoner.  there  are  a  few- 
minor  functional  differences.  I-igure  9  shows  connections  in  the  context  of  a  backward  reasoning 
system.  In  a  forward  reasoner.  the  enabler  in  the  arbitrator,  c  .  lrfi.  connects  to  the  collector  of  the 
asstKiated  predicate,  while  the  enablers  in  the  input  banks  receive  inputs  from  the  collectors  of  the 
input  predicate  banks.  The  collectors  in  the  arbitrator  and  input  banks  are  left  unconnected,  as  are 
the  enablers  in  the  predicate  banks. 

16.  Hach  role,  however,  can  he  represented  by  a  cluster  of  nodes  thereby  providing  a  physically- 
distributed  representation. 
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Abstract 


We  describe  an  alternate  approach  to  visual  recognition  of  hand-printed  words, 
wherein  an  image  is  converted  into  a  spatio-temporal  signal  by  scanning  it  in  one  or 
more  directions,  and  processed  by  a  suitable  connectionist  network.  The  scheme  offers 
several  attractive  features  including  shift-invariance  and  explication  of  local  spatial 
geometry  along  the  scan  direction,  a  significant  reduction  in  the  number  of  free  param¬ 
eters,  and  the  ability  to  process  arbitrarily  long  images  along  the  sc2m  direction.  Other 
salient  features  of  the  work  include  the  use  of  a  modular  and  structured  approach  for 
network  construction  and  the  integration  of  connectionist  components  with  a  procedu¬ 
ral  component  to  exploit  the  complementary  strengths  of  both  techniques.  The  system 
consists  of  two  connectionist  components  and  a  procediural  controller.  One  network 
concurrently  makes  recognition  and  segmentation  hypotheses,  and  another  performs 
refined  recognition  of  segmented  characters.  The  interaction  between  the  networks  is 
governed  by  the  procediural  controller.  The  system  is  tested  on  three  tasks:  isolated 
digit  recognition,  recognition  of  overlapping  pairs  of  digits,  and  recognition  of  ZIP 
codes. 


1  Introduction 


A  device  capable  of  hand-print  recognition  has  numerous  applications  in  diverse  areas  such  as  postal  sorting, 
print-to-voice  transcription  devices  for  the  visually  handicapped  and  humaui-machine  interaction.  *  Given 
its  importance  and  scope,  the  hand-print  recognition  problem  has  received  considerable  attention  from 
researchers  in  the  fields  of  pattern  recognition  and  machine  vision  for  over  30  years  (e.g.,  (Bledsoe  and 
Browning,  1959;  Highleyman,  1961;  Chow,  1962;  Duda  and  Fossum,  1966;  Munson,  1968;  Pavlidis  and  All, 
1975;  Caskey  and  Jr,  1973;  Yamamoto  and  Mori,  1979;  Lam  and  Suen,  1988;  Gader  et  al.,  1991;  Suen  et  al., 
1992;  Le  Cun  et  ed.,  1990;  Blackwell  et  al.,  1992;  Garris  et  al.,  1992;  Knerr  et  al.,  1992;  Fukushima  et  al.,  1983; 
Burr,  1988;  Denker  et  al.,  1989;  Shridhar  and  Badlerin,  1987;  Fenrich  and  Krishnamoorthy,  1990;  Keeler 
et  al.,  1991).  In  fact,  it  is  perhaps  one  of  the  oldest  and  most  explored  problems  in  computer  science.  Yet  the 
problem  still  remauns  largely  unsolved.  The  difficulty  in  developing  an  effective  solution  to  the  problem  can 
be  attributed  to  the  extremely  high  variance  of  unconstrauned  hand-print.  This  variance  is  due  to  a  number 
of  factors  including:  mechanical  differences  in  stylus  and  writing  surface,  inter-author  variations  such  as 
writing  style,  slant,  and  handedness,  and  even  intra-author  differences  related  to  the  purpose  of  writing  and 
the  mood  of  the  author.  Taken  together,  these  factors  introduce  tremendous  variability. 

At  the  word  recognition  level  the  problem  is  further  confounded  due  to  variations  in  inter-character  spac¬ 
ing.  Since  hand-print  is  not  constrained  to  a  uniform  pitch,  adjacent  characters  frequently  touch  or  have 
overlapping  bounding  boxes.  This  gives  rise  to  the  char2u:ter  segmentation  problem  in  which  overlapping 
characters  must  be  teased  apart  prior  to  recognition.  Doing  so,  however  is  not  so  straightforward  since 
overlapping  characters  lead  to  the  segmentation  and  recognition  dilemma:  in  order  to  segment  a  pair  of 
characters,  the  characters  must  first  be  recognized,  but  in  order  to  recognize  the  characters,  they  must  first 
be  segmented.  Given  this  dilemma  and  the  high  degree  of  variance  in  hand-print,  it  is  not  surprising  that 
the  problem  of  hand-print  recognition  has  remained  lau-gely  unsolved.^ 

In  this  paper  we  investigate  a  particular  approach  to  visual  pattern  recognition  and  describe  its  application 
to  hand-printed  character  and  word  recognition.  A  key  feature  of  our  approach  is  that  we  treat  spatial  images 
as  time-varying  spatio-temporal  signals  and  process  them  using  appropriate  connectionist  networks.  Some 
other  seilient  features  of  our  approach  are  (i)  the  use  of  a  modular  and  structured  approach  for  network 
construction  and  (ii)  the  integration  of  connectionist  components  with  a  procedural  component  to  exploit 
the  complementary  strengths  of  both  techniques. 

^The  scope  of  the  problem  can  partially  be  gauged  by  the  fact  that  the  United  States  Postal  Service  alone  processes  over  80 
million  hand-printed  pieces  of  mail  every  day. 

^In  contrast  to  hand-print  recognition,  excellent  results  have  been  obtained  for  reading  of  machine  printed  text  where  single 
character  error  rates  as  low  as  .01%  have  been  reported  (Schiirmann,  1982). 
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Motivation 


The  variance  inherent  in  pattern  recognition  problems  such  as  hand-print  recognition  suggests  the  utiliza¬ 
tion  of  a  system  capable  of  learning  complex,  linearly  non-separable,  and  fuzzy  categories  from  examples. 
Connectionist  networks  offer  a  powerful  framework  for  pursuing  this  approach  and  their  strength  has  been 
demonstrated  in  a  variety  of  pattern  recognition  problems  including  speech  recognition  (e.g.,  (Watrous,  1990; 
Waibel  et  al.,  1989;  Boulard  and  Morgan,  1994)),  face  recognition  (e.g.  (Cottrell  and  Metcalfe,  1991))  and 
even  visual  hand-print  recognition  (e.g.,  (Denker  et  al.,  1989;  Le  Cun  et  al.,  1990;  Keeler  et  al.,  1991)). 
Connectionist  solutions  are  also  attractive  because  once  a  network  is  trained,  its  simplicity,  homogeneity, 
and  par2dlelism  can  be  exploited  by  VLSI  technology.  An  entire  network  can  be  etched  on  a  single  microchip 
and,  consequently,  can  attain  very  rapid  recognition  rates.  Implementation  is  therefore  relatively  accessible, 
inexpensive,  and  attractive. 

Visual  pattern  recognition  schemes  typically  operate  upon  static  images  whereby  an  image  is  presented 
to  a  system  as  a  time-invariant  signal.  This  is  also  true  of  most  connectionist  approaches  to  hand-print 
recognition  (e.g.,  (Denker  et  al.,  1989;  Le  Cun  et  al.,  1990;  Keeler  et  al.,  1991)).  An  alternate  viewpoint  is  to 
consider  an  image  to  be  a  time-varying  signal  which  is  presented  to  a  system  in  a  piecewise  fashion  over  time. 
For  example,  one  could  envisage  a  left-to-right  scan  of  an  image  in  which  a  system  receives  the  tth  column  of 
the  image  at  time  i.  Such  a  scan  converts  a  static  image  ir^o  a  spatio-temporal  signal  extending  over  several 
time  steps.  This  approach  offers  several  ad\'tmtages:  it  leads  to  shift-invariance  along  the  temporalized 
dimension,  it  explicates  the  local  spatial  relationships  in  the  image  along  the  temporalized  dimension,  it 
requires  networks  with  fewer  free  parameters  (weights),  and  it  allows  the  assimilation  of  arbitrarily  long 
images  along  the  temporalized  direction.  These  advantages  are  discussed  in  Section  2. 

As  is  now  widely  recognized,  training  random  or  minimally  organized  networks  using  general  purpose 
learning  techniques  is  not  a  feasible  methodology  for  obtaining  scalable  solutions  to  complex  learning  prob¬ 
lems.  We  therefore  adopt  a  more  structured  approach  wherein  we  incorporate  some  prior  structure  in  our 
networks  and  embed  pretrained  feature-detectors  along  with  other  “hidden”  units.  We  edso  adopt  a  modular 
approach  in  order  to  make  learning  tractable.  For  example,  instead  of  training  a  monolithic  network  for  rec¬ 
ognizing  all  the  ten  digits,  we  develop  a  separate  network  for  each  digit.  Taken  together,  the  use  of  structure 
and  modularity  allows  the  incorporation  of  domain  knowledge,  reduces  the  number  of  free  parameters,  and 
simplifies  error  analysis. 

Although  connectionist  networks  possess  attractive  features  for  pattern  recognition  applications,  in  many 
domains  there  is  abundant  domain  knowledge  that  can  be  utilized  effectively  by  traditional  procedural 
techniques  in  a  convenient  manner.^  Consider  recognizing  hand-printed  ZIP  codes,  for  example.  A  well- 
formed  ZIP  code  will  contain  either  five  digits  or  nine  digits  (and  perhaps  a  dash).  This  constraint  can 

^This  does  not  mean  that  connectionist  models  cannot  incorporate  such  knowledge.  The  issue  is  simply  one  of  adopting  a 
technique  that  is  suitable  for  expressing  and  utilizing  certain  types  of  knowledge. 


Figure  1:  An  overview  of  the  hybrid  system. 


easily  be  exploited  by  a  procedural  controller.  Other  domain  specific  knowledge  (e.g.,  statistics  gathered 
firom  envelopes  arriving  at  particular  postal  branches  in  the  case  of  ZIP  codes)  and  stsmdsurd  dictionary 
based  algorithms  can  also  be  implemented  effectively  using  a  procedural  approach.  This  suggests  a  hybrid 
approach,  wherein  fast  and  robust  connectionist  networks  perform  recognition  in  concert  with  a  procedural 
component  capable  of  incorporating  systematic  domain  knowledge,  heuristics,  and  well-studied  algorithms. 


1.1  Preview 

We  have  developed  a  system  for  hand-printed  word  recognition  using  the  concepts  described  above.  The 
system  recognizes  well-printed  word  images  containing  white  space  between  characters  as  well  as  more  difficult 
images  in  which  characters  are  ill-formed,  disjoint,  or  overlapping. 
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The  system  consists  of  two  connectionist  networks  and  a  procedural  controller  (see  Figure  1).  One  network, 
called  the  Coarse  Recognition  Device  (CRD),  assimilates  a  word  image  in  a  left-to-right  fashion  over  time 
and  performs  coarse  character  recognition.  While  doing  so,  it  also  hypothesizes  segmentation  boundaries 
between  chareu^ters.  The  other  network,  called  the  Refined  Recognition  Device  (RRD),  specializes  in  isolated 
character  recognition,  and  attempts  to  classify  portions  of  the  image  hypothesized  to  be  characters  by  the 
CRD.  The  two  networks  are  governed  by  a  conventional  procedural  controller,  capable  of  fusing  signals 
emanating  from  the  two  networks  while  incorporating  dommn  knowledge.  The  final  recognition  is  the  result 
of  the  combined  effort  of  the  three  components.  Our  focus  in  this  work  has  primarily  been  the  development 
of  the  two  connectionist  components  and  the  evaluation  of  the  spatio-temporal  approach  since  we  perceived 
these  to  be  the  most  challenging  aspects  our  approach.  Consequently,  the  procedural  component  has  received 
only  limited  attention. 

The  system  (without  any  high-level  domain  knowledge  encoded  in  the  procedural  controller)  was  tested  on 
three  tasks;  isolated  digit  recognition,  recognition  of  overlapping  pairs  of  digits,  and  recognition  of  ZIP  codes. 
On  a  test  set  of  2,700  isolated  digits,  provided  by  the  United  States  Postal  Service,  the  system  achieved  a 
96.0%  accuracy.  On  a  test  set  of  207,000  isolated  digits,  provided  by  the  National  Institute  of  Standards 
and  Technology,  a  96.5%  accuracy  was  attained.  Six  sets  of  500  images  of  digit  pairs  whose  rectangular 
bounding  boxes  overlapped  were  synthesized  from  isolated  digits  for  testing.  The  sets  differed  depending  on 
the  degree  of  overlap  in  their  bounding  boxes  (0%,  5%,  or  10%  of  the  first  box  width).  System  accuracy 
ranged  from  87.6%  to  65.6%,  and  it  was  seen  that  performance  on  pairs  drawn  from  the  test  set  closely 
tracked  performance  on  pairs  drawn  from  the  training  set.  Finally,  recognition  performemce  was  measured 
on  a  set  of  540  real-world  ZIP  code  images,  provided  by  the  United  States  Postal  Service.  Using  a  criterion 
in  which  a  ZIP  code  classification  was  deemed  correct  if  and  only  if  the  produced  digit  string  matched  the 
complete  ZIP  code  exactly,  the  system  achieved  a  66.0%  accuracy.  Note  that  the  66%  rate  is  a  “worst-case” 
measurement — it  considers  a  classification  of  an  entire  ZIP  code  incorrect  in  the  event  that  any  constituent 
digit  is  incorrect. 

The  rest  of  the  paper  is  organized  as  follows.  In  section  2  we  present  the  spatio-temporal  approach  to 
pattern  recognition  and  argue  that  it  offers  a  number  of  advantages.  In  Section  3  we  describe  a  hybrid  and 
modular  spatio-temporal  system  for  hand-print  recognition  that  instantiates  this  approach.  We  discuss  the 
methodology  for  training  and  testing  the  system  in  Section  4  and  present  empirical  results  in  Section  5.  We 
conclude  with  a  general  discussion  and  an  outline  of  future  directions  in  Section  6. 

2  The  spatio-temporal  approach 

Visual  pattern  recognition  schemes,  including  connectionist  ones,  typically  operate  on  static  images  whereby 
an  image  is  presented  to  the  system  as  a  time-invariant  signal  (Denker  et  al.,  1989;  Le  Cun  et  al.,  1990; 
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Figure  2:  A  static  “0”  image  (left)  and  the  spatio-temporal  input  generated  by  a  left  to  right  scan  (right).  In 
the  latter,  the  vertical  axis  enumerates  input  units,  the  horizontal  axis  is  time,  and  the  third  axis  indicates 
the  level  of  activation. 

Martin  and  Pittman,  1990).  This  approach  has  produced  good  results  in  isolated  character  recognition  and 
has  also  been  applied  with  limited  success  to  word  recognition  (Keeler  et  al.,  1991). 

An  alternate  approach  is  to  convert  an  image  into  a  time-varying  signal  by  scanning  it  in  one  or  more 
directions  and  presenting  the  resulting  spatio-temporal  signal  to  the  recognition  system.  For  exeunple,  if  a 
system  scans  an  n  ♦  m  image  from  left  to  right,  it  receives  the  n  pixels  in  column  i  of  the  image  at  time 
i.  This  converts  the  static  image  into  a  spatio-temporal  signal  that  extends  over  m  time  steps  and  has  a 
spatial  span  of  n.  Figure  2  graphically  illustrates  this  by  showing  the  spatio-temporaJ  signal  generated  by  a 
left  to  right  scan  of  a  “0”.  The  image  of  a  “0”  is  shown  on  the  left  and  the  image  as  it  would  be  received  by 
a  network’s  input  units  is  shown  to  the  right.  The  horizontal  (x)  axis  represents  time,  while  the  vertical  (y) 
axis  enumerates  30  input  units.  The  plot  for  each  input  unit  depicts  its  activation  level  over  time  (the  levels 
of  activation  can  be  viewed  as  being  represented  along  the  z  axis  orthogonal  to  the  page). 

2.1  Advantages  of  the  spatio-temporal  approach 

Time-varying  signals  arise  naturally  in  problems  such  as  speech  recognition  and  time  series  prediction  where 
the  input  signal  has  an  explicit  temporal  aspect.  But  what  is  their  significance  for  visual  recognition?  We 
discuss  the  answer  below  and  point  out  what  we  think  are  inherent  advantages  in  considering  images  as 
spatio-temporal  signals. 

Most  work  on  visual  pattern  recognition  treats  an  image  as  a  static  two-dimensional  pattern.  Therefore  the 
suggestion  that  images  be  treated  as  spatio-temporal  signals  may  seem  counter-intuitive.  A  little  reflection, 
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however  makes  it  apparent  that  a  static  view  of  visual  processing  is  unrealistic.  In  general,  an  agent  must 
scan  its  environment  in  order  to  locate  and  identify  objects  of  interest.  Even  in  a  more  restricted  setting  such 
as  recognizing  ZIP  codes  on  pieces  of  mail,  a  device  must  scan  the  face  of  the  envelope  to  locate  the  region 
containing  the  ZIP  code.  Finally,  even  if  the  (starting)  location  is  known,  scanning  is  required  if  the  image 
contains  a  number  of  objects.  Observe  that  reading  text  essentially  involves  processing  a  continuous  stream 
of  visual  data  having  an  arbitrary  extent.  Thus  scanning  is  an  integral  part  of  visual  processing.  On-line 
character  recognition  systems  that  use  values  of  the  position  (and  optionally,  velocity  and  acceleration)  of 
the  pen  over  time  to  recognize  characters  (e.g.,  (Guyon  et  al.,  1991;  Schenkel  et  al.,  1993))  can  also  be  viewed 
as  systems  that  “scan”  the  characters  as  they  are  being  constructed  over  time. 

Shift-Invariance 

A  recognition  system  which  responds  identically  to  an  object  regardless  of  the  spatial  location  of  the  object, 
is  shift-invariant.  In  pixel-level  image  recognition  using  traditional  connectionist  networks,  the  number  and 
arrangement  of  input  units  typically  correspond  to  the  number  and  arrangement  of  pixels  in  the  input 
image.  Since  an  object  may  appear  at  different  spatial  locations  in  different  images,  the  relevant  data  may 
be  assimilated  by  different  sets  of  input  units.  Hence,  a  method  must  be  devised  for  recognition  regardless 
of  which  set  of  input  units  receives  the  data.  An  obvious  but  significant  advantage  of  our  approach  is  that 
it  naturally  leads  to  a  recognition  system  that  is  shift  invariant  along  the  temporalized  axis(es).  When 
an  image  is  scanned,  any  ‘white  space’  in  the  image  generates  a  zero  input  and  leaves  the  network  state 
uneiffected.  Thus  the  network  ignores  ‘white  space’  and  responds  to  the  object  it  is  trained  to  recognize 
wherever  (or  whenever)  it  encounters  that  object  in  the  image.  Thus  shift-invariance  fdong  the  temporalized 
ends  falls  out  as  a  natural  byproduct  of  the  approach. 

The  spatio-temporal  approach  explicates  the  image  geometry 

The  local  spatiad  relationships  in  the  image  along  the  temporalized  dimension  are  naturally  expressed  in  the 
scanned  input.  Consider  a  unit  in  the  first  hidden  layer  of  a  traditioneil  (static)  network.  The  activation 
received  by  this  unit  from  units  in  the  input  layer  are  unlabeled  levels  of  activation,  amd  hence,  this  unit 
cannot  determine  which  inputs  come  from  spatially  neighboring  pixels.  As  far  as  the  hidden  unit  is  con¬ 
sidered,  the  input  it  receives  from  an  image  /  is  indistinguishable  from  the  input  it  would  receive  from  the 
image  I'  obtained  by  permuting  the  pixels  of  I.  Now  consider  a  hidden  unit  in  the  spatio-temporal  network. 
The  inputs  to  this  unit  from  two  adjacent  pixels  (along  the  temporalized  dimension)  become  available  in 
adjacent  time  steps.  Hence  the  spatio-temporal  approach  makes  spatial  locality  explicit  by  mapping  it  into 
temporal  locality. 
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Reduction  in  network  complexity 

In  the  spatio-temporal  approach  a  spatial  dimension  is  replaced  by  the  temporal  dimension  and  this  leads 
to  models  that  are  architecturally  less  complex  than  similar  models  that  tise  two  spatial  dimensions.  This 
reduction  of  complexity  occurs  because  the  spatial  extent  of  any  feature  in  an  object’s  image  is  much  less 
than  the  spatial  extent  of  the  object’s  image.  Consider  the  case  where  an  object’s  image  is  n*m.  Let  the 
extent  of  the  image  along  the  temporalized  dimension  be  m  and  let  the  maximum  width  of  any  feature  along 
this  dimension  be  k.  Typically  k  will  be  significantly  less  than  m.  A  traditional  network  for  recognizing  this 
object  would  require  n  *  m  input  units.  Observe  however,  that  the  processing  ability  of  a  traditional  network 
can  be  replicated  by  a  spatio-temporal  network  containing  only  n  input  units  connected  to  hidden  units  via 
a  bundle  of  k  links  with  propagation  delays  ranging  from  1  to  k.^  The  use  of  multiple  links  with  varying 
delays  allows  a  hidden  node  in  the  spatio-temporal  network  to  receive  inputs  arising  from  a  limited  window 
(or  receptive  field)  of  height  n  and  width  k.  The  limited  width  of  this  receptive  field,  however,  is  sufficient 
since  it  exceeds  the  .;e  of  all  features  in  the  image!  Furthermore,  as  scanning  progresses,  this  receptive  field 
slides  along  the  scan  direction  and  fully  traverses  the  image.  If  we  assume  that  hidden  units  act  as  feature 
detectors  then  the  moving  receptive  field  of  a  hidden  node  in  the  spatio-temporal  model  leads  to  an  effective 
tessellation  of  the  feature  detector  over  the  image  without  the  actual  (physical)  replication  of  the  feature 
detector. 

In  view  of  the  equivalent  processing  power  of  a  spatio-temporal  network  and  a  traditional  network,  it  can 
be  argued  that  while  the  number  of  links  required  by  a  spatio-temporal  network  is  proportional  to  n*k,  the 
number  of  links  in  a  a  traditional  network  will  be  proportional  to  n*m.  Typically,  k  is  much  less  tham  m 
and  therefore  the  spatio-temporal  model  will  require  significantly  fewer  links.  During  training,  the  number 
of  links  in  the  network  corresponds  to  the  number  of  free  parameters  in  a  non-linear  optimization  process, 
and  a  substantial  reduction  in  the  number  of  free  parameters  can  yield  faster  optimization. 

Processing  arbitrarily  long  inputs 

A  common  difficulty  of  the  connectionist  approach  to  pattern  recognition  is  that  a  network  must  have  a  fixed 
number  of  inputs,  and  thus  must  process  images  of  a  fixed  size.  This  makes  it  difficult  for  a  conventional 
connectionist  model  to  recognize  words  —  although  progress  has  been  made  by  replicating  and  tessellating 
network  substructures  to  accommodate  images  with  multiple  chareicters  (Keeler  et  al.,  1991).  In  contrast, 
the  ability  to  process  arbitrsurily  long  images  is  inherent  in  our  approach,  and  offers  an  alternate  means  to 
process  word  images  within  a  connectionist  framework  and  relaxes  the  restriction  of  fixed-size  inputs  (see  3). 

^The  discussion  assumes  that  input  units  are  fully  connected  to  hidden  units.  The  basic  point  however,  also  holds  for  limited 
connectivity  between  input  and  hidden  layers. 
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Addressing  the  segmentation  and  recognition  dilemma 

The  use  of  scanning  also  partially  solves  the  segmentation/recognition  dilemma.  Most  vision  systems  perform 
a  segmentation  step  and  then  attempt  to  recognize  the  segments.  This  approach  is  feasible  as  long  as  objects 
are  non-occluding.  If  an  image  contains  several  objects  that  touch  and/or  overlap,  segmentation  becomes 
problematic  and  the  system  is  faced  with  the  segmentation/recognition  dilemma.  As  explained  in  Section  3, 
the  recognizer  can  continually  update  the  activation  level  of  its  output  units,  as  the  image  is  being  scanned 
from  left  to  right  and  this  activation  trace  may  also  be  used  to  estimate  the  segmentation  point. 

2.2  Spatio-temporal  networks 

Processing  a  spatio-temporal  signal  requires  a  model  capable  of  processing  time-varying  signals.  A  number 
of  researchers  have  proposed  network  models  to  represent  and  process  such  signals  (e.g.,  (Elman,  1990; 
Jordon,  1987;  Lapedes  and  Farber,  1987;  Mozer,  1989;  Waibel  et  al.,  1989;  Watrous  and  Shastri,  1986).  The 
connectionist  model  we  employed  was  inspired  by  the  Temporal  Flow  Model  (TFM)  which  has  achieved  good 
results  in  speech  recognition  (Watrous,  1990;  Watrous,  1991).  TFM  supports  arbitrary  link  connectivity 
across  layers,  admits  feedforward  as  well  as  recurrent  links,  and  allows  variable  propagation  delays  to  be 
associated  with  links.  These  features  provide  a  means  for  smoothing  and  differentiating  signals,  measuring 
the  duration  of  features,  and  detecting  their  onset.  They  also  allow  the  system  to  maintain  context  over 
a  window  of  time  and  thereby  carry  out  spatio-temporal  feature  detection  and  pattern  matching.  Taken 
together,  the  use  of  recurrent  links  and  variable  propagation  delays  provide  a  rich  mechanism  for  short-term 
memory,  integration  and  context  sensitivity  —  properties  that  are  essential  for  processing  time  varying  signals 
—  and  provides  a  potentially  powerful  mechanism  for  performing  feature  detection  and  pattern  recognition. 

Spatio-temporal  networks  also  have  a  sound  basis  in  biology.  It  is  well  known  that  circuits  for  auditory 
processing  in  animals  make  explicit  use  of  propagation  delays  (e.g.,  see  Edelmem  et  al.  1988).  Similarly, 
propagation  delays,  delay  tuned  neurons,  and  coincidence  detectors  are  used  by  bats  for  echo-location  and 
by  the  bam  owl  for  localization  of  objects  via  the  detection  of  differences  in  inter-aured  timing  (e.g.,  see 
(Carr  and  Konishi,  1990)). 

3  The  word  recognition  system 

3.1  Overview 

The  complete  system  (refer  to  Figure  1)  consists  of  three  components;  the  Refined  Recognition  Device 
(RRD),  Coarse  Recognition  Device  (CRD),  and  Procedural  Controller  (PC).  The  system’s  ability  to  deal 
with  disjoint  as  well  as  overlapping  digits  stems  from  the  interaction  between  these  components. 

Without  loss  of  generality,  assume  that  an  image  is  being  sccuined  in  one  direction.  The  spatio-temporal 


8 


7)B  3h^A  IB 


Figure  3:  Output  unit  response  of  the  Coarse  Recognition  Device  in  response  to  a  set  of  images  depicting 
touching  or  overlapping  pairs  of  digits.  Sharp  peaks  in  response  correspond  to  recognition  of  a  digit  and 
subsequent  resetting  of  the  CRD. 


signal  resulting  from  the  scan  is  input  to  a  CRD  which  is  a  spatio-temporal  network  trained  to  act  as  a 
coarse  recognizer.  The  CRD  has  one  output  unit  for  each  class  in  the  domain.  As  the  image  is  scanned,  the 
activation  level  of  each  CRD  output  unit  indicates  the  degree  of  support  for  the  presence  of  a  token  of  the 
associated  class  in  the  region  currently  being  scanned.  When  the  support  for  any  class  reaches  a  threshold, 
the  scanning  stops  and  the  CRD  hypothesizes  the  presence  of  a  token  of  the  £q>propriate  class.  At  this  time, 
the  relevant  region  of  the  image  is  extracted  and  processed  by  the  RRD,  the  refined  recognition  network 
which  specializes  in  recognizing  isolated  digits.  RRDs  axe  also  spatio-temporal  networks  which  process  an 
extracted  region  by  scanning  it  in  one  or  more  directions.  On  the  completion  of  processing,  the  RRD  either 
confirms  or  rejects  CRD’s  hypothesis.  K  the  hypothesis  is  confirmed  by  the  RRD,  the  system  announces  the 
presence  of  the  appropriate  digit  at  the  appropriate  location  in  the  image  and  CRD  continues  its  scan  of  the 
image.  If  the  RRD  rejects  the  hypothesis,  it  considers  (overlapping)  regions  in  the  immediate  vicinity  of  the 
region  under  consideration  and  tries  to  locate  the  hypothesized  object.  If  the  hypothesized  object  is  still  not 
found,  CRD  continues  its  scan  of  the  image.^ 

The  interaction  between  the  CRD  and  RRD  is  mediated  by  the  procedural  controller  (PC).  It  is  the  PC 
which  detects  that  one  of  the  CRD  output  units  has  reached  threshold,  extracts  the  relevant  portion  of  the 
image,  and  passes  it  on  the  RRD. 

^In  the  actual  system  implementation  (see  Section  5.3),  the  process  described  above  is  preceded  by  a  connected  component 
extraction  step.  A  connected  component  is  simply  a  set  of  “on”  points  in  the  image  such  that  any  two  points  belonging  to  the 
same  component  are  connected  by  a  path  of  adjacent  “on”  bits.  Connected  components  can  be  extracted  by  a  simple  scan  of 
the  image  and  a  parallel  connectionist  implementation  is  described  in  (Fontaine,  1993).  Each  connected  component  so  obtained 
is  first  processed  by  the  RRD.  If  the  RRD  recognizes  a  component  as  a  digit  with  high  confidence,  the  component  is  deemed  to 
be  that  digit.  All  the  remaining  components  are  processed  by  the  CRD  and  RRD  in  the  manner  described  above. 
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Figure  4;  RRD  output  unit  response  to  a  typical  set  of  ZIP  code  digit  images 

* 

Figure  3  shows  the  response  of  the  CRD  in  response  to  a  set  of  touching  and  overlapping  pairs  of  digits. 
Sharp  peaks  correspond  to  recognition  of  a  digit  and  the  subsequent  resetting  of  the  CRD  by  the  PC.  Figure  4 
shows  the  output  unit  response  of  the  RRD  network  to  some  typical  isolated  ZIP  code  digit  images. 

Basic  architecture  of  CRD  and  RRD 

Both  CRD  and  RRD  networks  are  spatio-temporal  networks  with  multiple  hidden  layers,  feedforward  as  well 
as  recurrent  connections,  and  multiple  links  -  with  variable  delays  -  between  units.  Each  network  typically 
consists  of  four  layers:  an  input  layer,  two  hidden  layers,  and  an  output  layer. 

The  number  of  units  in  the  input  layer  is  determined  by  the  number  of  image  pixels  “seen”  at  each  step  of 
the  scanning  process.  For  example,  if  an  n  ♦  m  image  (i.e.,  an  image  with  n  rows  and  m  columns)  is  scanned 
from  left  to  right,  the  number  of  input  units  is  n.  If  the  image  is  scanned  in  multiple  directions,  there  are 
separate  banks  of  input  units  -  one  for  each  scan  direction. 

The  first  hidden  layer  is  best  viewed  as  a  layer  of  feature  detectors.  Each  unit  in  this  layer  has  an  associated 
receptive  field  and  is  expected  to  detect  the  occurrence  of  some  salient  feature(s)  in  this  field.  As  pointed 
out  in  Section  2.1,  the  receptive  field  of  a  unit  is  temporalized  and  moves  in  the  direction  of  scan  during 
processing. 
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Most  of  the  units  in  the  feature  detector  layer  are  adaptable  and  during  training,  ‘learn’  to  detect  appro¬ 
priate  feature(s)  in  the  image.  In  addition  to  these  adaptable  units,  some  pre-trained  feature  detectors  can 
be  embedded  in  the  hidden  layer.  We  do  this  by  including  units  connected  to  input  units  via  appropriately 
weighted  links  that  enable  these  units  to  detect  features  such  as  oriented  bars.  The  second  hidden  layer  re¬ 
ceives  inputs  hrom  the  feature  detector  units  in  the  first  hidden  layer.  Units  in  the  second  layer  integrate  the 
response  of  feature  detectors  and  adapt  so  as  to  detect  complex  features  and  non-local  feature  combinations 
required  to  recognize  objects  in  the  image.  We  now  describe  each  component  in  more  detail. 

3.2  Refined  recognition  device  (RRD) 

The  RRD  is  responsible  for  accurate  recognition  of  isolated  hand-printed  digits.  We  have  developed  the  RRD 
in  a  modular  manner  in  order  to  incorporate  domain  knowledge,  reduce  the  number  of  &ee  parameters,  and 
simplify  network  analysis.  The  RRD  consists  of  ten  individually  trained  Single  Digit  Recognition  Networks, 
each  of  which  is  responsible  for  the  detection  of  a  particular  digit.  Each  Single  Digit  Recognition  Network 
consists  of  four  Single  Scan  Networks,  each  of  which  assimilates  data  from  a  different  “scan”  of  the  image. 
A  Single  Scan  Network  is  constructed  from  a  number  of  adaptable  layers,  operating  in  conjunction  with  a 
number  of  pretrained  Feature  Detection  Modules.  A  Feature  Detection  Module  is  formed  by  the  replication 
and  tessellation  of  a  pretrained  Local  Receptive  Field. 

Feature  detection  modules 

Most  numerals  can  be  approximately  written  using  four  simple  stylus  strokes:  horizontal,  vertical,  slash, 
amd  backslash.  The  simplicity  auid  recurrence  of  these  strokes  suggests  the  utility  of  developing  pretrained 
feature  detection  modules,  which  can  be  integrated  into  a  larger  network.  A  separate  Local  Receptive  Field 
module  (or  LRF)  was  pretradned  to  detect  each  of  these  four  features  over  a  localized  airea. 

The  generic  LRF  module  is  seen  in  Figure  5.  It  receives  input  over  a  spatiad  field  of  4  inputs,  a  temporal 
field  of  4  time  steps,  and  consists  of  4  input  units,  4  hidden  units,  and  a  single  output  unit.  Hidden  unit  n 
receives  information  from  all  input  units,  and  utilizes  n  links  from  eau:h  input  unit,  with  respective  delays  of 
1, 2, . . . ,  n,  creating  a  spatial  window  of  width  n  into  the  temporal  signad.  As  long  as  a  feature  to  be  detected 
by  am  LRF  is  present  in  its  4  by  4  receptive  field,  the  LRF  will  emanate  am  output  signal,  albeit  with  a  slight 
lag.  Vairious  LRF  modules  for  detecting  horizontal,  vertical,  slash,  amd  bamkslash  strokes  were  trained  using 
the  saune  generic  architecture. 

Locad  detectors  cam  be  replicated  to  tessellate  am  entire  “column”  of  the  image.  But  note  that  the  tessadation 
along  the  other  dimension  occurs  implicitly  when  the  image  is  scanned.  We  refer  to  a  group  of  identicad 
and  tessellated  LRFs  as  a  Feature  Detection  Module,  or  FDM.  An  examiple  of  an  FDM  using  3  LRFs,  with 
am  input  unit  overlap  of  2  amd  covering  a  receptive  field  of  8  inputs,  is  seen  in  Figure  6.  The  dashed  box 
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Figure  5;  A  generic  Local  Receptive  Field  (LRF) 
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Figure  6:  A  generic  Feature  Detection  Module  (FDM) 
demarcates  the  entire  FDM. 

A  desirable  trait  of  the  feature  detectors  is  their  modularity.  Each  feature  detector  is  composed  from  an 
LRF  building  block  in  a  simple  manner,  and  the  number  of  useful  feature  detectors  is  limited  only  by  the 
number  of  useful  LRFs  which  can  be  developed.  At  a  different  level  of  modularity,  the  featmre  detection 
modules  can  be  inserted  into  a  larger  network  design.  During  optimization,  the  FDMs  are  masked  out  and 
are  not  considered  part  of  the  optimization  (although  they  could  be  fine-tuned  via  training,  if  desired).  This 
allows  the  incorporation  of  robust  feature  detectors  which  yield  useful  information  without  increasing  the 
dimensionality  of  the  optimization. 

Single  Scan  Networks 

The  signal  from  each  scan  is  processed  by  what  we  refer  to  as  a  Single  Scan  Network  (SSN).  Figure  7 
illustrates  the  configuration  of  a  SSN,  referred  to  as  a  Single  Scan  Network.  In  this  instamce,  the  SSN 
operates  on  20x20  images,  using  two  pretrained  FDMs  (a  horizonted  and  slash  stroke  detector),  and  several 
unstructured  hidden  layers.  The  input  units  pass  information  along  links  which  are  either  frozen,  if  they 
are  part  of  a  pretrained  FDM  (dashed  lines),  or  trainable,  if  they  are  “regular”  links  (solid  lines).  A  local 
hierarchical  structure  is  used  to  detect  higher  order  features  as  information  propagates  towards  the  output 
unit. 

A  specific  SSN  used  had  the  following  architecture;  Each  LRF  was  connected  to  4  (out  of  20)  adjacent 
input  units.  Each  input  unit  to  LRF  connection  consisted  of  3  links  having  delays  of  1,  3,  and  5  respectively. 
Adjacent  LRFs  had  an  overlap  of  two  input  units.  There  were  four  Feature  Detection  Modules  in  the  first 
hidden  layer,  with  each  FDM  containing  9  LRFs.  The  second  hidden  layer  was  arranged  in  4  banks  of  6 
units,  with  each  bank  receiving  input  from  a  corresponding  FDM.  Each  unit  in  a  bank  received  information 
from  4  contiguous  FDMs  via  unit  delay  links.  The  units  in  the  second  layer  of  the  Single  Scan  Networks 
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Digit  Recognition  Output  Unit 


20  input  Units,  receiving  information  from  a  particular  scan 

Figure  7:  A  Single  Scan  Network  Module  (SSN) 


were  connected  to  the  output  unit  using  links  with  delays  of  1,  3,  5,  and  7.  Self-recurrent  euid  threshold  unit 
links  were  placed  on  all  units.  This  SSN  consisted  of  161  units  and  1,490  links. 

Single  Digit  Recognition  Networks 

Consider  scanning  the  image  of  an  isolated  digit  using  a  left-to-right  column-wise  scan.  Although  useful 
discriminatory  information  may  be  present  in  the  rightmost  columns  of  the  image,  this  information  is  not 
detected  by  the  network  until  the  finaJ  time  steps.  Consequently,  it  may  be  more  effective  to  employ  multiple 
scans  in  a  variety  of  directions,  where  each  scan  feeds  information  into  a  sepaurate  group  of  input  units.  Use 
of  multiple  scans  also  adds  a  degree  of  redundancy,  and  hence,  robustness  to  the  recognition  process. 

In  the  multiple-scan  situation,  information  from  each  scan  is  processed  independently  and  concurrently 
by  the  SSNs  associated  with  each  scan  and  the  output  of  each  SSN  is  passed  to  a  single  output  unit.  This 
complete  network  is  referred  to  as  a  Single  Digit  Recognition  Network  (SDRN),  an  example  of  which  is  shown 
in  Figure  8.  80  input  units  are  used  in  this  case,  aligned  in  4  banks  of  20,  which  receive  information  from 
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Figure  8:  A  Single  Digit  Recognition  Network  Module  (SDRN) 


10  Output  Units 


80  Input  Units,  20  for  each  scan 


Figure  9:  A  Refined  Recognition  Device  (RRD) 


4  scans.  Information  from  each  scan  is  processed  independently  in  separate  SSNs,  and  the  information  is 
combined  at  the  output  level.  The  dashed  box  delimits  the  entire  Single  Digit  Recognition  Module.  Elach 
Single  Digit  Recognition  Network  is  trained  to  recognize  a  single  digit  class,  and  reject  all  others. 


The  Complete  RRD 


After  each  Single  Digit  Recognition  Module  is  trained  to  recognize  its  respective  digit,  all  networks  are 
combined  to  produce  the  RRD,  capable  of  recognizing  eill  ten  digits.  Figure  9  depicts  an  RRD  that  uses  four 
scans. 


3.3  Coarse  Recognition  Device  (CRD) 

The  Coarse  Recognition  Device  is  designed  to  provide  coaurse  character  recognition,  in  the  form  of  hypothesis 
formulation,  and  to  estimate  inter-character  segmentation  points  based  on  the  available  evidence.  The  CRD 
architecture  is  a  special  case  of  the  RRD  architecture  in  which  only  one  Single  Scan  network  is  used  within  a 
Single  Digit  Recognition  Network.  This  network  receives  information  from  a  left-to-right  scan.  As  scanning 
progresses  and  more  of  the  image  is  viewed,  confidence  in  digit  classifications  is  updated.  At  each  time  step, 
the  CRD  generates  a  signals  for  all  confidences  exceeding  a  thresholds.  The  CRD  therefore  produces  coarse 
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recognition  estimates.  If  only  one  character  is  present  in  the  image,  the  CRD  produces  a  signal  after  it 
has  observed  enough  of  the  character  to  recognize  it.  The  multi-character  case  is  similar,  except  that  CRD 
signals  are  also  interpreted  by  the  PC  as  h}i}otheses  for  inter-character  segmentation  points. 

A  specific  CRD  used  in  our  experiments  possessed  the  following  characteristics:  6  Feature  Detection 
Modules  (FDM)  in  the  first  hidden  layer  contained  9  LRFs  each  (the  LRFs  had  the  same  structure  as 
before).  The  second  hidden  layer  was  arranged  in  6  banks  of  6  units,  with  each  bank  receiving  input  from  a 
corresponding  FDM.  Each  unit  in  a  bank  received  information  from  4  contiguous  LRFs  via  unit  delay  links. 
The  units  in  the  second  layer  of  the  Single  Scan  Networks  were  connected  to  the  output  unit  using  links  with 
delays  of  1,  3,  5,  and  7.  Self-recurrent  links  were  placed  on  all  units.  The  complete  CRD  network  consisted 
of  111  units  and  1,118  links. 

3.4  Procedural  controller  (PC) 

A  traditional  component,  the  Procedural  Controller,  is  used  to  control  system  flow,  incorporate  systematic 
domain  knowledge,  and  make  final  classification  decisions. 

The  PC  passes  each  connected  component  in  the  image  to  the  connectionist  recognition  modules.  For  each 
component,  it  monitors  the  output  of  the  CRD  as  it  assimilates  the  component  in  a  left-to-right  fashion  and 
waits  for  the  CRD  to  build  up  recognition  confidences.  When  one  or  more  thresholds  are  met,  the  PC  sends 
the  most  recently  scanned  portion  of  the  image  to  the  RRD  for  verification.  If  the  RRD  accepts  a  singular 
hypothesis,  a  digit  is  recognized,  the  CRD  is  reset  to  a  zero  state,  and  the  system  continues  scanning  to 
recognize  the  next  digit.  If  the  RRD  rejects  the  estimate,  however,  the  CRD  must  either  continue  processing, 
or  backtrack.  For  example,  if  a  continued  scan  increases  confidence  in  the  current  hypothesis,  it  is  again  sent 
to  the  RRD  for  verification.  If  a  continued  scan  decreases  confidence,  then  thresholds  can  be  altered  to  be 
less  pessimistic  and  a  portion  of  the  image  rescanned. 

Our  current  implementation  of  the  word  recognition  system  uses  little  domeun-specific  knowledge.  This 
was  for  two  reasons.  First,  the  purpose  of  the  implementation  was  primarily  to  develop  the  spatio-temporal 
connectionist  components  and  benchmark  their  base  discriminatory  capabilities.  Second,  a  substemtive 
amount  of  work  has  been  done  on  incorporating  domsun  knowledge  into  word  recognition  (Doster,  1977; 
Riseman  and  Hanson,  1974;  Shingal  and  Toussaint,  1979).  The  following  discusses  some  potential  uses  of 
domain  knowledge  that  would  be  easy  to  incorporate  in  our  system  design. 

Typically,  a  ZIP  code  consists  of  either  5  or  9  digits.  This  knowledge  can  be  used  by  the  PC  to  maintain 
a  running  estimate  of  how  memy  digits  remain  to  be  seen,  and  use  this  estimate  to  guide  the  segmentation 
and  recognition  process.  Second,  although  it  is  common  to  have  an  overlapping  pair  of  digits,  it  is  rare  to 
have  overlapping  triplets.  Furthermore,  the  frequencies  of  two  consecutive  overlapping  digits  varies  greatly 
depending  on  the  class  of  each  digit.  Figure  10  (adopted  from  (Fujisawa  et  al.,  1992))  depicts  the  frequencies 
of  touching  pair  combinations.  That  is,  given  a  randomly  seimpled  set  of  touching  pairs,  the  left  graph 
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Figure  10:  Rrequencies  of  touching  hand-printed  digit  pairs 

shows  how  frequently  each  digit  class  can  be  expected  to  be  in  the  trailing  digit  position.  The  right  graph 
depicts  how  frequently  each  digit  class  can  be  expected  to  be  in  the  leading  digit  position.  The  PC  can 
utilize  such  knowledge  when  integrating  the  signals  emanating  from  the  RRD  and  CRD.  In  many  hand¬ 
print  domains,  only  a  subset  of  all  possible  strings  are  legal  and  hence,  a  dictionary  of  legal  strings  can  be 
made  available.  This  permits  the  utilization  of  predictive  dependencies  between  characters,  derived  from 
statistical  analysis  of  the  dictionary  (eg,  (Bledsoe  and  Browning,  1959))  and  the  usage  of  contextual  word 
postprocessing  algorithms  (eg,  (Doster,  1977;  Shingal  and  Toussaint,  1979)). 

The  incorporation  of  such  domain  knowledge  is  fairly  straightforward  when  a  procedural  component  is  used. 
In  particular,  our  approach  allows  the  PC  to  interact  with  the  connectionist  networks  during  recognition, 
maddng  knowledge-driven  recognition  possible. 

4  Training  and  testing  methodology 

4.1  Datasets 

A  good  dataset  for  hand-printed  digit  and  word  recognition  should  be  widely  available  and  voluminous, 
with  the  number  of  authors  approaching  the  number  of  images.  Furthermore,  the  authors  should  be  from  a 
diverse  background,  and  be  unaware  that  their  printing  will  be  used  to  train  and  test  a  recognition  device. 
The  “United  States  Postal  Service  Office  of  Advanced  Technology  Handwritten  ZIP  Code  Database  (1987)” 
meets  all  these  requirements  and  was  made  available  for  research  by  the  Office  of  Advanced  Technology, 
United  States  Postal  Service.  The  database  contains  approximately  2,400  grayscale  images  of  hand-printed 
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five  and  nine  digit  ZIP  codes,  scanned  from  letters  passing  into  the  Buffalo,  New  York,  Post  Office.  To 
simplify  bookkeeping,  only  five  digit  ZIP  code  images  were  used.  Next,  the  images  were  converted  from 
grayscale  to  binary  images.  Finally,  each  ZIP  code  image  was  broken  down  into  five  individual  digits  by 
making  linear  slices  between  consecutive  digits,  without  removing  stray  marks  or  extended  strokes. 

A  second  database  containing  pre-segmented  hand-printed  characters,  the  NIST  Special  Database  3,  was 
made  available  for  research  by  the  National  Institute  of  Standards  and  Technology.  The  database  contains 
313,389  images  of  isolated  alpha-numerals,  including  223,125  digits,  drawn  from  a  multi-authored  set  of  2,100 
images  of  full-page  hand-printed  forms. 

4.2  Division  of  Datasets 

The  USPS  database  was  used  for  both  training  and  testing  the  RRD  and  CRD,  and  testing  the  word 
recognition  system.  The  database  contains  ZIP  code  images  with  serial  numbers  ranging  from  bd.0001  to 
bd-2636.*  Prior  to  viewing  the  database,  ZIP  codes  with  serial  numbers  from  bd-0001  to  bd^OOO  were 
designated  as  training  images,  while  ZIP  codes  with  numbers  between  bd^OOl  and  bd^636  were  designated 
as  test  images.  The  training  set  consisted  of  1,090  five  digit  ZIP  code  images.  Of  the  617  ZIP  code  images 
set  aside  for  testing,  only  540  were  eventually  used.  59  images  were  excluded  because  they  were  9  digit  ZIP 
codes.  One  image  contained  only  4  digits  and  was  not  used,  while  smother  which  was  incorrectly  coded  was 
also  discarded.  Another  16  images  contained  dark  lines  running  across  them  due  to  postal  marks  and  scanner 
anomalies  and  were  discarded.  This  division  yielded  5,450  isolated  ZIP  code  digit  images  for  use  in  training 
the  RRD  smd  CRD,  2,700  isolated  digit  images  for  use  in  testing  the  RRD  and  CRD,  and  540  complete  ZIP 
code  images  for  testing  the  word  recognition  system.  Figure  1 1  illustrates  the  first  90  ZIP  code  images  in 
the  test  set. 

In  addition  to  the  above,  a  set  of  approximately  16,000  digit  images,  for  use  in  trauning  the  RRD  and 
CRD,  was  randomly  sampled  from  the  223,125  isolated  digit  images  in  the  NIST  database.  The  remaining 
set  of  207,000  images  were  reserved  for  testing  the  RRD  and  CRD. 

Training  set  for  RRD 

The  RRD  is  expected  to  reject  all  images  which  do  not  contain  isolated  digits.  Since  this  includes  non-digit 
blobs,  it  was  necessary  to  incorporate  such  images  into  the  trmning  set  as  negative  examples.  Therefore, 
additional  training  data  for  disjoint  strokes  was  synthesized.  The  RRD  is  also  expected  to  reject  components 
containing  multiple  digits.  Therefore  a  dataset  containing  multiple  digits  was  created.  Finedly,  the  CRD  may 
signal  a  recognition  hypothesis  before  it  has  completely  observed  the  leftmost  digit.  Consequently,  a  portion 
of  the  component  containing  only  a  partial  digit  may  be  sent  to  the  RRD  for  inspection.  The  RRD  should 

^Images  from  bd-lOOl  to  bd-1500  are  stored  on  a  separate  tape  and  designated  by  the  USPS  as  test  images  for  complete 
ZIP  code  recognition  systems.  These  images  were  not  used. 
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Figure  11:  First  90  ZIP  codes  in  the  USPS  test  set 
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reject  such  incomplete  digit  images  and  accept  more  fully  formed  digits.  In  view  of  this  another  dataset 
containing  partial  digits  was  constructed.. 

Synthetic  Data  for  Disjoint  Strokes  and  Partial  Images:  A  total  of  44  images  containing  pieces  of 
broken  ‘‘5”s  and  405  images  contmning  partial  digits  were  synthesized  as  negative  trmning  instances.  To 
produce  the  44  “5”  pieces,  22  images  of  broken  ‘'5’’s  were  selected  from  the  USPS  training  set.  The  digit  “5” 
as  chosen  because  it  seems  to  be  the  most  common  digit  printed  using  multiple  strokes.  A  visual  inspection 
of  a  sample  of  500  ZIP  codes  revealed  a  total  of  76  instances  containing  a  broken  “5”,  as  opposed  to  only  2 
instances  of  a  broken  “4” . 

To  produce  the  405  partial  digit  images  used  to  simulate  partial  data  which  might  be  provided  by  the 
CRD  in  the  form  of  premature  conjectures,  the  following  steps  were  taken:  (i)  For  each  digit  (except  1),  a 
set  of  45  images  containing  the  digit  was  randomly  sampled  from  the  USPS  training  set.  For  each  of  the  405 
images,  a  block  of  contiguous  columns  on  the  right  hand  side  of  the  image  was  deleted.  A  random  number 
of  columns  ranging  from  33%  to  50%  of  the  total  im^e  width  were  removed.  Finally,  a  completely  blank 
image  was  also  included  to  form  a  set  of  450  negative  examples  of  disjoint  strokes  and  partial  images. 
Synthetic  Data  for  Multiple  Digits:  A  set  of  500  images  of  overlapping  digit  pairs  was  synthesized.  For 
each  of  the  100  possible  digit  pair  orderings  XY  (eg,  01,  02,  . . .  ,  99),  5  images  were  generated  as  follows: 
(i)  two  digits,  X  and  Y,  was  randomly  sampled,  with  replacement,  from  the  USPS  training  set,  (ii)  the 
digit  images  were  separately  skew-normalized  and  scaled  to  a  uniform  height,  (iii)  the  X  and  Y  images  were 
horizontally  juxtaposed  to  form  a  single  image  (some  images  were  further  “squashed”  by  a  random  amount 
between  0%  and  10%  of  their  width  to  simulate  large  overlaps),  and  (iv)  the  XY  image  was  then  scaled  to 
fit  in  a  20x20  image,  preserving  the  aspect  ratio,  and  skeletonized.  450  of  these  images  were  retained  in  the 
negative  training  set.’^ 

The  Complete  RRD  TVaining  Set:  For  each  Single  Digit  Recognition  Network,  a  total  of  2025  positive 
and  2025  negative  training  examples  were  used.  For  the  positive  instances,  425  samples  were  drawn  from 
the  USPS  dataset,  amd  1600  samples  were  drawn  from  the  NIST  dataset.  For  negative  instances,  125  images 
of  ezwii  digit  class  (other  than  the  class  being  learned)  were  used  in  conjunction  with  450  partial  images 
and  450  multiple  digit  images.  Of  the  negative  examples  of  digits  of  the  class  not  being  learned,  45  were 
randomly  sampled  from  the  USPS  set  and  80  were  sampled  from  the  NIST  set. 

IVaining  set  for  CRD 

Unlike  the  RRD,  the  CRD  need  not  be  explicitly  concerned  with  rejecting  images  of  partial  or  multiple  digits. 
Consequently,  no  synthesized  images  of  partial  or  multiple  digits  were  required  as  negative  examples  in  the 
CRD  training.  Earlier  experiment,  however,  had  demonstrated  the  propensity  for  a  single  scan  network 

^In  future  work,  real-world  samples  should  be  collected  for  training,  insteeui  of  resorting  to  synthesizing  data.  This  would 
not  only  provide  a  more  realistic  sample  of  overlaps,  but  also  would  be  reflective  of  the  distribution  of  the  types  of  overlaps. 
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to  focus  on  strokes  paraUei  to  the  scanning  direction,  lb  c^set  this  effect  images  containing  a  hmizontal 
stroke  across  the  entire  image,  were  used  as  negative  examples.  In  addition,  the  empty  image  and  9  images 
containing  a  sparse  number  of  random  dots  were  used  as  negative  examples. 

In  addition  to  the  above,  the  USPS  and  NIST  data  used  to  train  the  RRD  was  also  tised  to  train  the  CRD. 
Thus  2025  images  were  used  as  positive  single  digit  recognition  examples  (425  USPS  and  1600  NIST  images) 
and  2055  images  were  used  as  negative  examples.  These  consisted  of  225  examples  of  each  other  digit  (112 
from  USPS,  113  from  NIST),  the  empty  image,  20  images  of  horizontal  strokes,  and  9  random  dot  images. 

4.3  Data  Representation 

Two  methods  of  representing  characters  for  hand-print  recognition  devices  are  typically  employed:  feature- 
level  and  pixel-level.  In  feature-level  representation,  features  such  as  strokes  or  edges  of  various  orientations 
are  extracted  from  an  image.  The  set  of  features  is  decided  a  priori  and/or  through  automatic  selection 
from  a  set  of  large  pool  of  features  pool  using,  for  example,  information-theoretic  measures.  In  pixel-level 
representation,  a  system  operates  directly  on  the  pixels  of  images.  Images  are  however,  typically  preprocessed 
to  remove  noise  and  normalize  certain  types  of  variations.  In  general,  the  structural  (visual)  integrity  of 
the  character  in  an  image  is  retained  throughout  the  preprocessing  stfge.  Pixel-level  representation  forces  a 
learning  method  to  acquire  the  features  necessary  for  discrimination.  In  our  work  we  made  use  of  pixel-level 
input  representations.® 

Preprocessing 

Preprocessing  an  image  can  enhance  the  recognition  capabilities  of  a  system  by  normalizing  certain  variations. 
The  following  pre-processing  steps  were  methods  used  on  isolated  digit  imtges. 

Low  pass  filtering:  High  frequency  noise,  or  “pepper”  noise,  commonly  occurs  in  imaging.  To  reduce 
the  adverse  effects  of  pepper  noise,  a  mask,  shown  in  Figure  12,  was  convolved  across  each  image.  The 
convolution  smoothed  the  image,  and  subsequent  binarization  eliminated  stray  pixels.  The  binarization 
threshold  was  set  such  that  on-bits  were  converted  to  off-bits  unless  the  weighted  sum  of  the  mask  exceeded 
1/2. 

Skew  normalization:  One  source  of  variance  produced  by  differences  in  handedness  and  style  is  the  skew 
of  print.  Skew  in  hand-print  can  be  viewed  as  a  distortion  produced  by  an  author  favoring,  and  perhaps 
elongating,  strokes  in  certain  orientations  which  (often  systematictdly)  deviate  in  angle  from  prototypical 
stroke  orientations.  A  moment-based  transformation  to  correct  individual  character  skew  (Bakis  et  al.,  1968) 
was  applied  to  all  isolated  digit  images.  This  technique  is  equivalent  to  shifting  rows  of  bits  horizontally  to 
remove  the  skew. 

*The  use  of  pretrained  feature  detectors  within  the  network  can  however,  be  thought  of  as  a  “hybrid”  of  pixel  and  feature 
based  representations. 
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Figure  12:  Low  pass  filter  mask  used  to  remove  pepper  noise 

0  ) 

Figure  13:  Examples  of  ZIP  code  digit  images  before  and  after  preprocessing 

Size  Normalization;  Isolated  digit  images  for  training  the  RRD  were  encased  in  a  bounding  box  by 
trimming  off  surrounding  white  space  and  scaled  down  to  a  20x20  image.  The  aspect  ratio  of  the  character 
was  preserved  by  padding  with  white  space,  if  necessary.  A  nearest  neighbor  method  which  sampled  pixels 
in  the  original  image  at  regular  intervals  was  used  to  perform  the  scaling  (Hou,  1983).  In  the  case  images 
used  for  training  the  CRD,  the  image  was  scaled  down  to  fit  in  a  rectangle  containing  twenty  rows  of  pixels 
while  preserving  the  aspect  ratio.  The  number  of  columns  therefore  varied,  depending  on  the  width  of  the 
digit.  After  scaling,  the  image  was  skeletonized.  This  scaling  routine  is  used  for  CRD  since  it  processes 
arbitrary  long  inputs. 

Skeletonization:  Skeletonization  was  employed  to  remove  variation  caused  by  differing  thicknesses  of 
writing  styli  and  image  quantizations.  Skeletonization  erodes  pixels  ftom  a  binary  image  until  strokes  of 
only  a  single  pixel  width  remain.  The  SPTA  skeletonization  method  (Naccache  and  Shinghal,  1984)  was 
used. 

Examples  of  images  before  and  after  filtering,  deskewing,  scaling,  and  skeletonization  are  show  in  Figure  13. 
The  digits  in  the  top  line  are  the  original  images  and  the  digits  on  the  bottom  line  are  the  digit  after 
preprocessing. 
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4.4  Target  Functions 

Since  spatio-temporal  networks  generate  an  output  at  each  time  over  the  assimilation  of  the  input,  one  must 
specify,  for  each  training  example,  the  desired  (target)  activity  of  output  units  at  each  step  of  processing. 
Several  classes  of  target  functions  were  considered.  These  included  linear,  step,  Gaussian,  and  sigmoid 
functions.  Based  on  our  experience,  the  following  asymmetric  sigmoid  target  was  used:  the  target  value  at 
the  first  time  step  for  both  positive  and  negative  examples  was  0.05.  At  subsequent  times  t,  the  target  value 
for  positive  examples  followed  a  rising  sigmoid  curve,  while  the  target  value  for  negative  examples  stayed 
constant  at  0.05.^ 

Intuitively,  for  positive  examples,  the  confidence  in  a  particular  classification  should  increase  slightly 
with  each  time  step  near  the  onset  of  the  image.  By  the  midrange  of  the  image,  or  slightly  thereafter, 
enough  information  should  have  been  assimilated  to  classify  the  image  with  some  confidence.  By  the  end  of 
assimilation,  the  network  should  be  certain  that  it  has  seen  a  particular  character  class. 

Since  digits  of  a  particular  class  may  contain  instance  of  widely  varying  widths  and  heights,  it  may  be 
useful  to  tailor  the  target  functions  to  individual  examples.  In  the  case  of  the  RRD,  digits  were  centered 
in  a  20x20  image  for  recognition.  A  fixed  sigmoid  target  function  was  found  to  be  adequate  for  positive 
examples.  Since  multiple  orthogonal  scans  were  utilized,  at  least  one  single  scan  network  was  able  to  receive 
image  information  to  satisfy  a  fixed  target. 

Since  the  CRD  uses  only  one  scan  and  needs  to  assimilate  images  of  differing  widths,  it  cannot  use  a  homo¬ 
geneous  target  function  over  all  examples.  Furthermore,  the  CRD  must  possess  shift-invariant  characteristics, 
allowing  it  to  ignore  arbitrary  amounts  of  white  space  before  reaching  the  onset  of  a  digit.  Therefore,  the 
t2u'get  function  was  chosen  to  be  a  sigmoid  with  its  onset,  inflection  point,  and  duration  customized  for  each 
example.  To  enforce  shift  invariance,  the  left  side  of  each  positive  example  was  “padded”  with  a  random 
amount  of  white  space  (from  1  to  30  contiguous  columns).  The  target  response  during  the  area  of  white 
space  was  0.05.  The  target  response  during  assimilation  of  the  actuEil  digit  was  a  sigmoid,  rising  from  0.05 
at  the  onset  to  0.95  at  its  end,  with  its  infiection  point  placed  60%  through  the  extent  of  the  example. 

4.5  Training 

Training  was  done  using  the  second-order  quasi-Newton  Broyden-Fletcher-Goldfarb-Shanno  (BFGS)  algo¬ 
rithm  (Luenberger,  1984)  using  (i)  Gradsim  —  a  system  for  applying  nonlinear  gradient  optimization  tech¬ 
niques  to  train  spatio-temporal  connectionist  networks  from  exaunples  (Watrous,  1988)  and  (ii)  GRAD-CM2  a 
data-parallel  version  of  GRADSIM  implemented  on  a  Connection  Maw;hine  CM-2  (Fontaine,  1992).  In  general, 
runs  were  terminated  when  (1)  MSE  had  fallen  below  0.0025,  amd  (2)  error  reductions  were  insignificantly 
small  over  a  large  number  of  objective  function  and  gradient  evaluations. 

®The  output  values  lie  in  the  interval  [0,1]. 
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4.6  Network  Scoring 

The  folloMwg  methods  were  used  to  make  classification  decisions,  based  on  the  output  of  the  network  over 
time. 

Integrated  Activation 

Since  the  output  unit  was  trained  to  respond  with  increasing  activity  upon  presentation  of  an  positive 
example,  and  respond  with  non-increasing  activity  to  negative  examples,  a  simple  “unit  score”  based  upon 
integrated  activation  was  used.  The  score  for  a  given  output  unit  was  determined  by  summing  the  individual 
activations  at  each  time  step  over  assimilation  of  the  entire  image,  and  then  normalizing  by  the  extent  of 
the  image. 

In  the  case  where  one  output  unit  was  employed  per  class  to  be  recognized  (eg,  the  RRD),  a  classification 
decision  was  made  using  a  simple  winner-take-all  approach,  wherein  the  image  is  classified  as  belonging  to 
the  class  corresponding  to  the  output  unit  which  generated  the  largest  time-normalized  integrated  output. 

The  integrated  outputs  of  the  units  can  be  interpreted  as  probability  estimations  by  normalizing  the  values 
to  obey  the  laws  of  probability  (Bridle,  1990).  Although  this  transformation  does  not  affect  single  object 
classification  (in  a  winner-take-all  sense),  it  is  useful  in  the  integration  of  various  components  of  a  recognition 
system  (eg,  the  RRD,  CRD,  and  PC).  The  statistical  properties  of  the  underlying  written  language  can  more 
readily  be  integrated,  and  communication  between  the  CRD  and  RRD  can  be  viewed  as  joint  probabilities, 
as  opposed  to  suggestive  signals.  Normalization  to  estimate  probabilities  was  used  in  the  word  recognition 
system. 

Rejection  Criterion 

It  is  often  of  practical  importance  to  assess  the  performance  of  a  recognition  system  by  deriving  the  percentage 
of  test  images  that  must  be  rejected  as  unclassifiabie  in  order  to  achieve  a  lower  error  rate  on  the  remaining 
images.  Consequently,  a  rejection  criterion  was  defined.  Considering  time-normadized  integrated  activation, 
let  Ah  be  the  highest  activation  of  the  N  output  units,  and  let  A,  be  the  second  highest  activation.  A 
measure  of  classification  confidence,  C,  was  defined  to  be  C  =  (1  -  A,)/(l  -  Ah). 

Since  Ah,  A,  6  (0, 1),  and  A,  <  Ah,  we  have  C  >  1.  Larger  values  of  C  indicate  more  confident  classifica¬ 
tions.  The  rejection  criterion  was  then  defined  such  that  for  some  e  >  0,  if  C  <  (1  c),  then  the  image  was 

rejected  as  being  unclassifiabie. 
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Figure  14:  USPS  test  images  misclassified  by  the  RRD 

5  Results 

5.1  Refined  recognition  device  (RRD) 

On  the  NIST  test  set  of  207,000  isolated  digit  images,  an  accuracy  of  96.5%  was  achieved  at  a  0%  rejection 
rate.  On  the  USPS  test  set  of  2,700  images,  an  accuracy  of  96.0%  was  obtained  with  no  rejections.  All 
USPS  test  set  misclassifications  are  shown  (in  preprocessed  form)  in  Figure  14.  The  number  before  the 
slash  below  each  image  is  the  true  classification  and  the  number  after  the  slash  is  the  (incorrect)  RRD 
classification.  The  USPS  accuracy  is  comparable  to  other  reported  results  using  test  samples  drawn  from 
the  USPS  database  (eg,  (Denker  et  al.,  1989;  Knerr  et  al.,  1992;  Le  Cun  et  al.,  1990)).  Although  caution 
must  be  taken  in  making  comparisons  since  true  performance  measures  c^mnot  be  obtadned  without  using 
identical  test  databases,  visited  only  once  by  each  recognition  system. 
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RRD  Test  Set  Rejections  vs  Test  Set  Accuracy 
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Figure  15:  Percent  of  USPS  digit  test  set  rejected  vs  RRD  accuracy 


In  re2d-world  implementation,  a  user  is  often  willing  to  allow  a  system  to  reject  a  portion  of  samples  as 
being  unclassifiable  in  exchange  for  improved  accuracy.  The  performance  of  the  RRD,  as  an  increasingly 
larger  percentage  of  the  USPS  test  images  are  rejected,  is  shown  in  Figure  15.  The  rejection  criterion  used 
was  based  on  the  ratios  between  the  highest  and  second  highest  time-integrated  unit  activations.  Figure  15 
was  derived  by  incrementing  a  rejection  threshold,  e  (cf.  Section  4.6),  until  all  images  were  rejected.  The 
steep  rise  in  accuracy  with  a  small  number  of  rejections  is  highly  desirable,  and  a  99%  accuracy  was  obtained 
upon  rejecting  9.5%  of  the  images.  A  detailed  analysis  of  the  RRD  performance  may  be  found  in  (Fontaine, 
1993). 
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Threshold 

Value 

Case  1 

Case  2 

Case  3 

Accuracy 

Incorrect 

Correct 

Incorrect 

Correct 

Incorrect 

0.4 

48 

720 

48 

1746 

138 

91.3% 

0.5 

62 

335 

13 

2149 

141 

92.0% 

0.6 

68 

136 

10 

2354 

132 

92.2% 

Table  1:  CRD  hypothesis  results  on  the  USPS  digit  test  set 
5.2  Coarse  recognition  device  (CRD) 

The  CRD  was  trained  on  single  digits  in  order  to  enable  it  to  make  a  good  hypothesis  concerning  the  first 
digit  it  encounters  as  it  scans  an  image  in  a  left-to-right  fashion;  the  focus  was  not  on  constructing  a  CRD 
capable  of  stand-alone  recognition  of  digits.  Consequently,  the  CRD  was  evaluated  on  images  containing 
single  digits  (the  USPS  set  of  2,700  digits)  using  two  modes  of  operation.  The  first  mode  measured  its  base 
recognition  capabilities.  The  second  mode  was  geared  towards  inspecting  the  ability  of  the  CRD  to  formulate 
hypotheses  using  a  simple  threshold  method. 

In  the  first  mode,  classification  was  performed  by  choosing  the  output  unit  which  yielded  the  highest  time- 
normalized  integrated  activation.  This  allowed  for  variations  in  the  width  of  each  digit  without  explicitly 
changing  the  target  function  for  each  test  image.  The  aim  was  to  evaluate  the  base  discriminatory  capability 
of  the  network  for  isolated  digit  recognition.  On  the  USPS  set  of  2,700  digits,  the  CRD  achieved  an  accuracy 
of  94.4%  with  no  rejections.  Samples  in  error  are  depicted  in  Figure  16  (for  ease  of  viewing,  the  samples  are 
shown  scaled  to  fit  in  a  bounding  box). 

The  second  mode  of  operation  was  geared  towards  evaluating  the  capability  of  the  CRD  to  produce 
hypotheses.  Note  that  the  formulation  of  classification  hypotheses  is  different  from  that  of  segmentation 
point  hypotheses  (which  is  detailed  in  Section  5.3).  The  same  dataset  was  used,  but  instead  of  making  a 
classification  based  on  integrated  activation  (after  an  entire  image  is  assimilated),  a  classification  was  made 
when  any  output  unit  activation  exceeded  a  predetermined  threshold.  After  a  classification  was  made,  the 
CRD  was  reset  and  scanning  continued.  Although  the  first  classification  produced  is  of  primary  interest, 
resetting  the  network  and  continuing  the  scan  helped  gauge  the  robustness  of  the  CRD  to  ignore  partial 
images.  The  decision  process  in  the  second  mode  of  operation  resulted  in  three  possibilities  for  classification: 
(1)  no  output  ever  exceeded  threshold,  and  hence  no  classification  was  made,  (2)  a  classification  was  made, 
the  network  was  reset,  and  one  or  more  other  classifications  were  subsequently  made,  and  (3)  exactly  one 
classification  was  made.  Case  (1)  is  undesirable  in  the  context  of  the  overall  system.  Case  (2)  is  acceptable  if 
the  first  digit  recognized  is  the  actual  digit  being  scanned.  Likewise,  case  (3)  is  acceptable,  if  the  classification 
is  correct.  Table  1  shows  the  CRD  results  using  vzo-ious  output  unit  threshold  values.  The  accuracy  was 
computed  by  summing  the  Case  2  and  Case  3  Corrects  and  dividing  by  the  total  number  of  images  (2700). 
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Figure  16;  USPS  digit  test  images  misclassified  by  the  CRD 
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Figure  17:  Examples  of  premature,  good,  and  late  segmentation  points 


The  results  were  as  expected,  with  more  rejections  occurring  with  higher  thresholds,  accompanied  by  fewer 
multi-digit  classifications.  Also,  first  digit  classification  accuracy  seems  acceptable.  A  detailed  anajysis  of 
the  RRD  performance  may  be  found  in  (Fontaine,  1993). 

5.3  Word  recognition  system 

Before  presenting  the  performance  results  we  outline  the  functioning  of  the  word  recognition  system  as 
currently  implemented  and  discuss  the  generation  of  segmentation  points  based  on  the  CRD  response. 

Threshold  Derivation 

The  following  questions  need  to  be  addressed  in  the  spatio-temporal  framework:  At  what  point  during  the 
CRD  inspection  of  an  image  is  there  enough  evidence  to  hypothesize  a  classification?  How  much  of  the  image 
should  be  sent  to  the  RRD,  i.e.,  where  should  a  segmentation  be  made?  Ideally,  a  hypothesis  should  be  made 
as  soon  as  possible  during  the  inspection  of  an  image  m  order  to  expedite  recognition  and  the  segmentation 
point  should  correspond  to  the  end  of  the  digit  being  a.  similated. 

Figure  17  depicts  three  examples  of  hypothesized  segmentation  points.  The  leftmost  example  shows  a 
premature  segmentation  point,  the  center  example  shows  a  good  segmentation  point,  and  the  rightmost 
example  shows  a  late  segmentation  point.  Hypothesizing  an  early  segmentation  point  has  two  disadvantages. 
First,  system  throughput  is  decreased  since  extra  interaction  must  occur  between  the  CRD  and  RRD  to  refine 
the  segmentation  point.  Second,  if  the  RRD  accepts  a  premature  segmentation  point,  the  CRD  is  forced 
to  examine  the  remaining  portion  of  the  digit  and  may  posit  “ghost”  hypotheses.  For  example,  suppose  a 
segnjentation  point  is  hypothesized  by  the  CRD  roughly  75%  through  assimilation  of  the  digit  “8”,  as  in 
the  leftmost  example  in  Figure  17.  If  the  RRD  accepts  the  hypothesis  the  CRD  is  forced  to  examine  the 
remaining  25%  of  the  image  and  may  hypothesize  the  presence  of  a  “3”  (the  RRD  was  explicitly  trained 
to  reject  partial  images  for  exactly  this  reason).  If  a  segmentation  point  is  positioned  too  late,  such  that 
it  overruns  the  next  digit,  both  the  RRD  verification  of  the  current  digit  and  the  CRD  assimilation  of  the 
next  digit  can  be  affected.  Fortunately,  there  exists  a  simple  method  for  producing  fairly  good  segmentation 
points. 
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Using  the  target  function  for  positive  instances  during  CRO  SDRN  training  as  a  model  for  future  response, 
the  width  of  a  test  digit  being  assimilated  may  be  estimated  at  any  point  during  recognition.  The  target 
function  used  for  positive  examples  during  SDRN  training  was  a  sigmoid: 


d(i)  = 


1 

1  +  e-(Mx(x-C)) 


(1) 


where  d(x)  is  the  desired  output  of  the  SDRN  when  a  given  fraction,  x,  of  the  image  had  been  assimilated 
(x  €  [0, 1]).  M  is  a  (positive  real)  value  controlling  the  shape  of  the  sigmoid  (M  =  10.0  was  used  for  the 
RRD  and  CRD),  and  C  is  a  fixed  fraction  of  the  length  of  the  target  specifying  the  location  of  the  inflection 
point.  If  C  is  0.5,  the  output  unit  response  will  exceed  0.5  after  half  of  a  positive  example  is  assimilated.  If  it 
is  assumed  that  positive  test  examples  will  respond  according  to  this  target,  then  half  of  a  digit’s  width  will 
have  been  witnessed  by  the  network  when  its  output  unit  activation  exceeds  0.5.  This  width  may  be  doubled 
to  serve  as  an  estimate  of  the  digit’s  actual  width  and,  consequently,  a  segmentation  point.  Although  this 
scheme  does  not  produce  exact  segmentation  points  for  all  cases,  it  was  found  to  approximate  the  point 
reasonably  well. 

In  addition  to  projecting  the  segmentation  point,  one  also  needs  to  determine  the  level  of  output  activation 
at  which  the  CRD  should  hypothesize  the  presence  of  a  digit.  The  current  implementation  uses  a  simple 
threshold  method  and  makes  a  hypothesis  when  a  CRD  output  unit  exceeds  a  predefined  threshold,  ff.  The 
choice  of  B  should  be  large  enough  to  reduce  false  positives,  yet  small  enough  to  allow  the  CRD  to  make 
enough  conjectures.  In  all  experiments,  6  was  set  to  be  0.5  (the  target  value  at  the  inflection  point).  In 
future  work,  we  plan  to  use  a  dynaunic  value  of  B  which  is  derived  automatically  based  on  performance  results 
on  ail  appropriate  set  of  training  data. 


Processing  by  the  word  recognition  system 

Given  a  binary  image  containing  one  or  more  digits,  recognition  progresses  in  three  stages:  (i)  a  component 
recognition  stage  in  which  the  RRD  tries  to  identify  whether  any  connected  components  are  well-formed 
digits,  (ii)  a  rejected  component  analysis  stage  in  which  the  CRD  and  RRD  interact  to  classify  the  remaining 
components  of  the  image,  and  (iii)  a  decision  making  stage  to  assign  a  classification  and  confidence  to  the 
image  as  a  whole. 


Stage  1:  Connected  Component  Recognition 

Connected  components  are  found  and  each  connected  component  in  the  image  is  passed  to  the  RRD.  The 
RRD  acceptance  criterion  is  set  pessimistically,  since  it  is  desired  to  recognize  only  those  components  which 
can  confidently  be  recognized  as  digits.  The  threshold  was  set  such  that  the  RRD  recognized  isolated  digits 
with  a  99.5%  accuracy,  rejecting  16.8%  of  the  images.  At  the  end  of  Stage  1,  the  RRD  acceptance  threshold 
was  altered  to  be  more  accepting  in  Stage  2.  The  threshold  was  set  to  obtain  99.0%  accuracy  at  a  9.5% 
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rejection  rate. 

After  the  components  are  sent  to  the  RRD,  the  skew  of  each  recognized  component  is  weighted  by  its 
mass,  and  an  average  measure  of  skew,  /i,  is  produced.  The  remaining  components  are  deskewed  by  a  factor 
of  n-  In  cases  where  no  component  was  recognized  (which  rarely  occurred),  the  image  is  not  deskewed. 

Stage  2:  Rejected  Component  Recognition 

The  components  not  accepted  in  Stage  1  by  the  RRD  are  inspected  by  the  CRD  in  Stage  2.  One  or  more 
components  are  joined  if  there  is  not  significant  columnar  white  space  between  them  (where  “significant”  is 
taken  to  be  a  fraction  of  the  height  of  the  image — 15%  in  the  current  implementation). 

The  CRD  then  processes  each  image  component  separately.  During  left-to-right  assimilation,  if  any  CRD 
output  unit  exceeds  the  threshold  value  ofB  =  0.5,  a  hypothesis  is  made  and  a  projected  segmentation  point 
is  computed.  The  image  area  between  the  last  accepted  segmentation  point  (or  the  image  onset)  and  the 
hypothesized  segmentation  point  is  sent  to  the  RRD  for  verification.  If  the  RRD  rejects  the  hypothesis, 
the  CRD  continues.  If  it  accepts,  the  segmentation  point  is  moved  forward  imtil  RRD  confidence  decreases. 
The  classification  and  confidence  are  recorded,  and  both  the  CRD  and  RRD  networks  are  reset.  If  the  CRD 
produces  no  hypothesis  during  assimilation  of  a  component,  it  is  forced  to  provide  its  most  confident  single 
digit  classification,  regardless  of  the  confidence  level. 

At  this  point,  a  classification  for  the  entire  image  can  be  reported.  Stage  3,  however,  combines  the  evidence 
from  the  individual  classifications  to  produce  an  overall  confidence  level. 

Stage  3:  Decision  Making 

In  the  current  implementation,  recognition  of  each  digit  in  the  image  is  taken  to  be  independent  of  the 
other  digits.  Since  each  classification  produces  a  confidence  level  expressible  as  a  probability,  a  classification 
probability  is  assigned  to  the  entire  image  by  multiplying  the  probabilities  associated  with  each  digit  classi¬ 
fication.  Thus,  the  only  action  taken  in  Stage  3  is  a  simple  multiplication  of  probabilities.  It  is  considered 
a  separate  stage,  however,  since  algorithms  taking  into  account  the  underlying  distributions  can  easily  be 
employed  not  only  to  produce  more  confident  classifications  based  on  available  domain  knowledge,  but  also 
to  produce  ranked  hypotheses  concerning  missing  or  extra  digits  (Doster,  1977;  Riseman  and  Hanson,  1974; 
Shingal  and  Toussaint,  1979).^° 

^°In  Stage  2,  the  CRD  is  operating  in  “forced”  mode.  In  “unforced”  mode,  if  the  CRD  either  cannot  produce  a  hypothesis 
while  assimilating  a  component,  or  if  the  CRD  and  RRD  cannot  agree,  a  “don’t  know”  value  is  produced.  The  cited  algorithms 
can  be  used  to  instantiate  the  “don’t  know”  values,  based  on  information  produced  during  recognition  and  the  class  conditional 
probabilities. 
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Set 

Drawn  FVom 

%  Overlap 

%  Touching 

%  First  Correct 

%  Pair  Accuracy 

1 

Train 

0 

9.6 

95.6 

87.8 

2 

Train 

5 

46.6 

92.6 

77.0 

3 

Train 

10 

63.8 

92.8 

66.2 

4 

Test 

0 

10.2 

95.0 

87.6 

5 

Test 

5 

45.6 

92.6 

74.0 

6 

Test 

10 

59.8 

92.0 

65.6 

Table  2:  Recognition  results  on  synthesized  pairs  of  USPS  digits 
5.4  System  Results 

Results  on  the  problems  of  overlapping  digit  pair  recognition  and  USPS  ZIP  code  recognition  are  now  pre¬ 
sented.  The  PC  utilized  no  domain  knowledge  regarding  individual  character  form,  frequency,  or  contextual 
dependencies.  In  addition,  no  restrictions  were  assumed  on  the  number  of  digits  which  could  appear  in  an 
image,  the  amount  of  white  space  (or  lack  of  white  space)  between  consecutive  digits,  or  author-specific 
style.  The  goal  was  to  evaluate  the  base  discriminatory  capabilities  of  the  system  without  relying  on  domain 
knowledge  and  heuristics.  One  underlying  assumption,  however,  was  that  adjacent  characters  showed,  to 
some  extent,  uniformity  in  their  baselines,  skew  of  print,  and  size.  This  is  not  an  overly  restrictive  assumption 
for  most  hand-print  recognition  tasks,  since  authors  tend  to  print  uniformly  within  a  word. 

Digit  Pair  Recognition 

The  capability  of  the  system  to  segment  and  recognize  overlapping  and  touching  digit  pmrs  was  tested. 
Since  paiirs  of  digits  which  touch,  or  whose  fields  overlap,  are  not  readily  avedlable,  test  data  was  synthesized 
from  the  isolated  USPS  digit  images.  Images  of  digit  pairs  were  generated  as  explained  in  Section  4.2.  The 
system  was  tested  on  6  separate  data  sets.  Each  data  set  was  comprised  of  500  images,  with  5  images  of  each 
possible  XY  combination  of  the  10  digits.  The  6  sets  differed  depending  on  whether  the  digits  were  drawn 
from  the  training  or  testing  set,  and  how  much  they  overlapped.  Table  2  shows  recognition  results  with 
the  CRD  in  forced  mode,  rejecting  no  images.  The  columns  of  the  table  represent,  respectively,  the  test  set 
number,  the  USPS  set  from  which  the  digits  were  drawn  (train  or  test),  the  overlap  percentage  used  during 
pair  synthesis,  the  percentage  of  the  set  containing  touching  pairs  as  a  result  of  the  overlap,  the  percentage 
of  the  set  in  which  the  first  digit  of  the  pair  was  correctly  classified,  and  the  percentage  of  the  set  in  which 
the  pair  was  correctly  classified.  A  pair  classification  was  deemed  correct  if  and  only  if  both  digits  were 
correctly  classified.  Figure  18  depicts  the  images  in  Set  6,  the  most  difficult  test  set,  which  were  correctly 
identified. 

The  percentage  of  each  set  containing  digit  pairs  which  touch  is  significant.  It  is  common  for  traditional 
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Figure  18:  Correctly  identified  USPS  digit  pair  images  from  Set  6 
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segmenters  to  utilize  columnar  whitespace  to  hypothesize  segmentation  points.  Yet,  since  this  test  data 
(by  construction)  contains  no  inter-character  white  space,  many  segmenters  would  experience  difficulty  in 
dealing  with  such  samples. 

In  addition,  the  percentage  of  the  set  in  which  the  first  digit  of  the  pair  was  correctly  classified  is  also 
important.  If  the  first  digit  can  be  classified  with  good  confidence,  which  the  results  suggest,  the  classification 
could  be  used  to  help  disambiguate  subsequent  digits,  particularly  if  the  class  conditional  distributions  are 
known.  One  could  also  imagine  dual  CRDs  operating  conjointly,  one  assimilating  data  from  a  left-to-right 
scan  and  the  other  &om  a  right-to-left  scan.  Their  classifications  could  be  compared,  with  more  weight 
placed  on  the  first  classification  of  each  and  less  weight  on  subsequent  classifications. 

It  is  difficult  to  draw  performance  comparisons  to  other  approaches  due  to  a  lack  of  a  standardized  test 
sets  and  a  dearth  of  reported  results  on  digit  pair  (and  string)  classification.  Moreover,  many  results  which 
are  reported  are  alphabet-dependent,  relying  on  topological  features  of  the  digits  to  provide  hints  for  seg¬ 
mentation  points.  These  results,  however,  are  comparable  to  another  reported  result  using  synthesized  sets 
of  digit  pairs  without  alphabet-specific  knowledge  (Keeler  et  al.,  1991).  In  addition.  Table  2  shows  little 
variation  between  images  created  from  the  training  or  test  sets,  suggesting  good  generalization. 

The  ability  of  the  system  to  recognize  digit  p«drs  was  further  tested  by  assuming  it  was  known  that  exactly 
two  digits  were  present  in  each  image.  If  such  knowledge  were  avmlable  at  the  onset  of  recognition,  a  system 
could  be  tailored  to  perform  more  effectively.  Here,  the  assumption  was  made  to  facilitate  error  analysis. 
After  the  system  classified  an  image,  if  the  classification  was  not  exactly  two  digits  long,  the  image  was 
considered  to  be  rejected. 

Table  3  summarizes  recognition  results,  allowing  the  system  to  reject  classifications  not  of  length  2.  It 
depicts  the  distribution  of  images  rejected  due  to  their  length  (1  or  3  digits  long),  the  percentage  of  test  set 
classification  rejected  due  to  length,  and  the  system  accuracy  on  the  remaining  images.  Figure  19  illustrates 
all  rejected  and  incorrect  classifications.  Rejected  classifications  are  prefixed  with  "R:”  in  their  label.  The 
true  classification  appears  before  the  slash  in  the  label  of  an  image,  and  the  system  classification  appears 
after  the  slash. 

As  expected,  a  significant  increase  in  accuracy  is  achieved.  More  importantly,  however,  the  results  indicate 
that  the  system  is  being  too  pessimistic.  This  is  evidenced  by  the  high  ratio  of  the  number  of  rejects  of 
length  1  to  the  number  of  rejects  of  length  3  and  suggests  a  future  area  of  work. 

ZIP  Code  Recognition 

The  same  system  which  was  applied  to  digit  pair  images  was  also  applied  to  the  real-world  ZIP  codes  provided 
by  the  United  States  Postal  Service.  The  system  was  able  to  correctly  classify  66.0%  of  the  540  test  images. 
A  classification  was  deemed  correct  if  and  only  if  it  matched  the  true  ZIP  code  exactly.  Note  that  the  66% 


35 


Set 

Rejected  Image  Length 

%  Total  Rejects 

%  Accuracy 

1 

3 

1 

39 

8 

9.4 

96.9 

2 

98 

5 

20.6 

97.0 

3 

149 

9 

31.6 

96.8 

4 

28 

9 

7.4 

94.6 

5 

95 

12 

21.4 

94.1 

6 

138 

9 

29.4 

92.9 

Table  3;  Recognition  results  on  synthesized  pairs  of  USPS  digits  with  length  rejection 
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Length  of  Classification 

ID 

a 

3 

4 

6 

7 

Number  of  Occurrences 

ID 

Q 

10 

57 

22 

4 

Table  4;  Frequencies  of  ZIP  code  rejections  due  to  classifications  not  of  length  5 

rate  is  therefore  a  “worst-case”  measurement — it  considers  a  classification  of  2ui  entire  ZIP  code  incorrect  in 
the  event  that  any  constituent  digit  is  incorrect. 

For  the  same  reasons  as  digit  pair  recognition,  it  is  difficult  to  make  performance  comparisons  for  ZIP  code 
recognition.  One  benchmark  for  comparison,  however,  is  the  accuracy  of  the  RRD  if  it  were  able  to  inspect 
each  digit  in  a  ZIP  code  as  if  it  were  an  isolated  digit.  Since  the  RRD  achieved  a  96.0%  accuracy  on  the 
USPS  test  set  of  isolated  digits,  it  can  be  expected  to  correctly  classify  approximately  100  x  0.96®  =  81.5% 
of  the  ZIP  codes,  assuming  the  digits  were  correctly  isolated.  Of  course,  this  is  an  upper  bound  (given 
the  described  RRD),  and  the  system  cannot  be  expected  to  achieve  such  accuracy  for  several  reasons.  A 
significant  number  of  the  ZIP  codes  contained  touching  sequences  of  digits,  disjoint  digits,  stray  blotches, 
and  ascenders/descenders  firom  other  lines  on  the  envelope.  More  exactly,  the  set  of  540  images  contained 
97  overlapping  digit  pairs,  more  than  80  disjoint  digits,  several  stray  blotches,  and  17  ascenders/descenders. 
The  system,  as  implemented,  can  hardly  hope  to  classify  an  image  containing  a  stray  blotch  or  an  ascender 
(descender)  correctly,  since  it  is  forced  to  generate  an  extra  digit  classification. 

Performance  was  also  measured  by  rejecting  classifications  which  did  not  contain  exactly  five  digits.  Accu¬ 
racy  increased  to  80.4%  at  a  17.8%  rejection  rate.  Figure  20  shows  the  ZIP  codes  which  were  still  classified 
incorrectly.  The  label  below  each  image  denotes  the  actual  ZIP  code  (before  the  slash)  and  the  system’s 
classification  (after  the  slash). 

Table  4  reports  the  frequencies  of  the  length  rejections.  Although  70  claissification  were  rejected  due  to 
omission  of  one  or  more  digits,  only  26  were  rejected  on  the  basis  of  extra  digits.  This  suggests  that  the 
either  the  RRD  should  be  more  accepting,  the  CRD  should  be  less  pessimistic,  or  both. 

Using  the  confidences  based  on  multiplying  individual  digit  classification  probabilities.  Figure  21  depicts 
rejection  rate  versus  accuracy  on  USPS  ZIP  codes,  assuming  17.8%  have  already  been  length-rejected. 

6  Discussion 

In  this  section  we  discuss  severaJ  issues  which  arose  during  the  conception,  formulation,  and  implementation 
of  the  spatio-temporal  approach  to  visual  pattern  recognition,  and  outline  promising  avenues  for  future 
work.  The  problem  of  recognizing  hand-print  by  machine  is  neither  new  nor  solved.  This  investigation  can 
be  placed  in  proper  context  if  one  considers  the  research  and  development  effort  that  has  been  spent  on  this 
problem  over  the  past  forty  years.  The  intent  of  our  effort  was  to  investigate  an  alternate  framework  for 
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Figure  20:  Incorrectly  identified  USPS  ZIP  codes 
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coping  with  some  of  the  limitations  inherent  in  many  conventional  approaches. 

IVaditional  approaches  have  proceeded  by  dividing  the  word  recognition  problem  into  two  phases:  segmen¬ 
tation  into  component  characters  followed  by  recognition  of  each  component.  The  vast  majority  of  research 
effort  has  been  invested  in  developing  devices  capable  of  carrying  out  the  second  pnase  of  recognizing  iso¬ 
lated  characters,  with  relatively  modest  progress  made  in  the  segmentation  area.  To  some  extent,  it  is  not 
surprising  that  an  aidequate  solution  has  remained  elusive  using  this  methodology.  Given  an  arbitrary  alpha¬ 
bet,  and  a  pair  of  overlapping  characters  from  that  alphabet,  it  is  simply  not  possible  to  segment  the  pair 
without  using  a  mechanism  to  (partially  or  completely)  recognize  the  component  characters.  This  dilemma 
strongly  suggests  the  need  for  the  development  of  alternate  models  of  recognition  to  cope  with  such  cases. 
This  investigation  has  taken  a  first  step  towards  addressing  this  fundeimental  problem. 

6.1  Validation  of  the  Spatiotemporal  Approach 

Do  spatiotemporal  models  offer  advantages  over  traditional  feedforward  networks  on  visual  recognition  prob¬ 
lems  such  as  character  recognition?  In  a  theoretical  sense,  there  is  a  type  of  equivalence,  since  a  spatiotem¬ 
poral  network  can  be  “unfolded”  in  time  and  viewed  as  a  spatial  network.  Structurally,  however,  emulating 
a  spatiotemporal  network  in  a  feedforward  sense  involves  multiple  replication  and  concatenation  of  network 
structures,  depending  on  the  extent  of  the  examples  to  be  assimilated.  Thus,  from  an  optimization  and 
implementation  viewpoint,  the  equivalence  is  quite  tenuous. 

Our  experience  suggests  that  the  spatiotemporal  approach  has  several  advantages  over  feedforward  net¬ 
works:  shift-invariance  and  explication  of  local  image  geometry  along  the  temporalized  axis,  a  reduction 
in  the  number  of  free  parameters  occurs,  and  the  ability  to  process  mbitrarily  long  inputs.  The  latter  is 
particularly  relevant  in  the  context  of  hand-print  recognition,  since  it  provides  a  natural  mecheinism  within 
the  connectionist  framework  to  cope  with  the  segmentation/recognition  dilemma. 

Validation  through  empirical  investigation,  however,  ultimately  relies  on  the  produced  results.  On  just  the 
problem  of  isolated  digit  recognition,  the  utility  of  the  approach  was  verified,  evidenced  by  recognition  results 
which  are  comparable  to  the  current  state  of  the  eirt  on  a  real-world  set  of  difficult  digit  images.  Further,  it 
was  seen  that  spatiotempor2il  networks  are  capable  of  recognizing  images  of  multiple  and  overlapping  digits. 
Good  recognition  accuracy  was  achieved  on  difficult  images  which  meiny  traditional  segmenters  could  not 
possibly  segment  and  recognize.  Although  several  areas  for  improvement  were  discovered  through  analysis, 
and  extensions  are  possible,  the  presented  results  substantiate  the  approach  and  warrant  further  investigation. 
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6.2  System  Formulation 

Refined  Recognition  Device 

Ideally,  the  RRD  should  recognize  characters  in  the  target  alphabet  and  reject  all  non-character  blobs.  Of 
course,  the  dimensionality  of  the  input  space  makes  the  derivation  of  such  a  device  very  difficult.  FVom  a 
system  development  point  of  view,  however,  it  suffices  to  construct  an  RRD  capable  of  recognizing  char¬ 
acters  and  rejecting  non-character  blobs  which  it  is  likely  to  encounter.  This  point  of  view  was  taken  into 
consideration  during  RRD  development  for  digit  recognition.  Within  the  context  of  the  envisaged  system, 
it  was  decided  that  the  RRD  should  be  capable  of  rejecting  multiple  and  partial  digits,  since  it  was  Ukely  to 
encounter  these  cases. 

By  setting  CRD  hypothesis  parameters  conservatively,  it  was  hoped  that  it  would  be  better  to  venture 
premature  segmentation  point  hypotheses  rather  than  late  hypotheses,  allowing  the  CRD/RRD  interaction 
to  derive  a  suitable  segmentation  point.  Thus,  negative  training  instances  of  “partial  multiples”  were  not 
used.  However,  two  problems  cropped  up  as  a  result  of  not  employing  such  training  examples.  First, 
although  CRD  parameters  were  set  to  produce  conservative  (premature)  segmentation  point  estimates,  late 
segmentation  point  estimates  were  occasionally  made.  Second,  in  the  implemented  system,  if  a  hypothesis 
was  accepted  by  the  RRD,  the  hypothesis  boundary  was  moved  forward,  and  the  process  repeated  until 
RRD  confidence  was  reduced.  This  sometimes  resulted,  somewhat  surprisingly,  in  an  accurate  segmentation 
point  being  expanded  to  a  late  segmentation  point,  as  RRD  confidence  (incorrectly)  did  not  recede,  despite 
the  fact  that  it  was  receiving  a  partial  multiple.  To  a  large  extent,  the  CRD/RRD  interaction  relies  on  the 
ability  of  the  RRD  to  make  appropriate  rejections.  Therefore,  in  future  work,  the  RRD  should  be  trained 
on  a  sizable  body  of  negative  partial,  multiple,  and  partial  multiple  images. 

Coarse  Recognition  Device 

The  CRD’s  recognition  performance  fell  short  of  the  RRD’s.  The  shortcoming  was  due  to  the  combination  of 
the  increase  in  learning  task  complexity  and  the  decrease  in  network  mechanism.  It  appears  that  the  overall 
performance  can  be  improved  if  the  target  function  is  aligned  later  in  the  assimilation  process  so  that  the 
the  CRD  can  witness  a  larger  portion  of  the  image  before  making  hypotheses. 

Although  results  on  recognition  of  difficult  digit  pair  images  suggest  that  vertical  segmentation  produces 
good  results  on  the  set  of  digits,  we  need  to  look  at  other  sorts  of  segmentation  boundaries.  One  should  not 
expect  that  a  vertical  slice  will  cilways  segment  a  digit  pair  into  relatively  clean  images  of  two  (fully-formed) 
digits.  It  should  also  be  possible  to  improve  vertical  segmentation  by  augmenting  the  RRD  training  set  with 
images  derived  by  vertically  slicing  multi-digit  images.  That  is,  starting  with  a  set  of  overlapping  digit  pairs, 
each  pair  could  be  vertically  sliced  at  a  “good”  location,  and  the  component  images  used  as  positive  training 
examples  for  the  RRD. 
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The  methodology  used  to  train  the  CRD,  in  which  isolated  digits  were  used  as  traiining  examples  and 
positive  instances  were  trained  to  respond  in  accordance  to  rising  sigmoid  targets,  is  only  one  of  several 
possible  approaches.  An  interesting  alternative  is  to  train  the  CRD  on  multiple  digit  images  in  order 
to  explicitly  enforce  segmentation  point  hypotheses.  Using  target  functions  comprised  of  multiple  Gaussian 
peaks,  centered  at  the  midrange  of  each  digit,  provides  a  method  for  the  CRD  to  recognize  pairs  independently 
of  the  RRD. 

Another  approach  is  to  perform  a  local  search  in  the  area  of  a  hypothesized  segmentation  point,  sending 
e2wdi  portion  to  the  RRD.  This  could  correct  the  problem  of  late  segmentation  points.  A  method  which 
seems  likely  to  produce  very  good  results  (albeit  somewhat  brute-force)  is  to  employ  two  CRDs,  one  of 
which  assimilates  images  in  a  left-to-right  fashion  and  the  other  in  a  right-to-left  fashion.  Since  it  was  seen 
that  the  CRD  was  capable  of  quite  accurately  recognizing  the  first  digit  of  a  pair,  the  classifications  made  by 
each  of  the  two  CRDs  could  be  weighted  accordingly.  In  the  single  digit  case,  more  confident  classification 
could  be  made.  And,  problems  involving  inter-class  similarity  of  localized  regions  could  be  reduced. 

Finally,  it  should  be  noted  that  the  CRD  may  be  quite  useful  in  combination  with  other  segmentation 
approaches,  since  it  is  capable  of  producing  a  good  estimate  of  the  region  of  overlap  between  digits. 

Procedural  Controller 

In  the  implemented  system,  the  PC  was  deemphasized  and  ch2irged  primarily  with  simple  monitoring  and 
decision  making  tasks.  The  incorporation  of  procedural  algorithms  utilizing  the  statistical  properties  of 
the  written  language  can  greatly  augment  performance,  both  during  and  after  the  connectionist  networks’ 
inspection  of  the  image.  Although  postprocessing  algorithms  have  been  studied  eind  found  to  be  effective 
in  augmenting  recognition  performance  (eg,  (Doster,  1977)),  they  are  of  less  interest  here,  since  they  may 
easily  be  added  to  any  recognition  system. 

The  usage  of  character  distributions  amd  other  domain  knowledge  during  processing  is  an  advantage  offered 
as  a  consequence  of  the  spatiotemporal  approach.  As  the  CRD  is  assimilating  an  image,  tor  example,  more 
of  the  spatial  structure  of  the  image  is  revealed.  As  classification  confidences  build,  one  could  imagine 
interpreting  the  confidences  with  respect  to  the  statistical  distributions  of  the  language,  thereby  affecting 
CRD  hypotheses.  In  addition,  procedural  algorithms  could  be  interjected  before  assimilation  is  complete,  in 
order  to  verify  hypotheses  or  refine  segmentation  points. 

6.3  Extensions 

The  extension  of  the  approach  to  other  character  sets  is  of  great  practical  interest.  Consider  an  N  class 
recognition  problem,  in  which  the  goal  is  to  classify  an  object  as  belonging  to  one  of  N  classes.  Assuming  a 
forced  classification,  a  total  of  inter-class  boundaries  exist.  The  extension  from  the  set  of  10  digits 
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Figure  22:  An  example  of  the  role  of  context 

to  the  set  of  36  digits  and  lower  case  letters,  for  example,  results  in  an  increase  from  45  to  630  inter-class 
boundaries.  “  Hence,  any  recognition  system  is  faced  with  a  more  difficult  recognition  task  as  the  number 
of  classes  increases. 

Both  the  number  of  character  classes  and  the  geometric  forms  of  the  chciracter  classes  contribute  to  the 
difficulty  of  recognizing  overlapping  characters.  As  illustrated  in  Figure  22,  recognizing  certmn  characters, 
or  sequences  of  characters,  is  not  always  possible.  Due  to  the  high-level  nature  of  context  affects  involved, 
such  cases  are  of  limited  interest.  However,  similar  examples  requiring  inspection  of  localized  areas  are 
necessary  for  recognition  and  are  of  interest.  Due  to  the  nature  of  the  basic  CRD  scheme,  in  which  an  image 
is  assimilated  in  a  left-to-right  fashion  over  time,  certain  combinations  of  adjacent  character  can  produce 
“ghost  characters” ,  requiring  assimilation  beyond  the  true  segmentation  point  in  order  to  disambiguate  the 
pair.  For  example,  consider  overlapping  a  lower  case  “c”  (on  the  left)  with  a  lower  case  “r”  (on  the  right). 
The  “or”  sequence  contains  a  “ghost  image”  of  an  “a”  (or  possibly  an  “o”)  and  as  the  CRD  scans  in  a  left- 
to-right  fashion,  it  is  necessary  to  progress  beyond  the  true  segmentation  point  in  order  to  witness  enough  of 
the  “r”  in  order  to  disambiguate  the  sequence.  Further,  what  if  the  language  in  question  contains  characters 
whose  overlaps  cannot  be  adequately  segmented  via  vertical  strokes? 

These  concerns  are  justified,  insofar  as  they  question  the  ability  of  the  implemented  system  to  operate  on 
other  alphabets.  The  hand-printed  digit  recognition  system  described  in  detail  in  this  paper,  however,  is  only 
one  possible  instantiation  of  the  general  framework.  Alternate  mechamisms,  some  of  which  were  described 
earlier  in  this  section  (eg,  dual  using  CRDs  or  training  the  CRD  to  respond  in  Gaussian  peaks,  etc),  can  be 
used  to  augment  the  shortcomings  of  the  system,  as  implemented.  In  the  simplest  case,  the  CRD  provides 
valuable  information  (eg,  good  first  charcicter  recognition  rates)  and  can  be  integrated  with  other  methods. 

"This  number  is  slightly  less,  of  course,  if  similar  character  classes  (eg,  the  letter  “o”  and  the  number  “0”),  are  considered 
as  one.  A.id,  it  is  possible  (albeit  unlikely)  that  additional  classes  induce  only  trivial  (easily  derived)  inter-class  boundaries. 
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Conclusion 


Given  the  success  of  spatiotemporal  connectionist  networks  in  hand-print  recognition,  it  is  likely  that  they  can 
be  applied  successfully  to  other  visual  recognition  problems.  Initial  investigation  into  using  spatiotemporal 
networks  to  recognize  simple  objects  has  demonstrated  a  stronger  recognition  dependence  on  object  vertices 
than  on  edge  mid-sections  (Farid  et  al.,  1993),  as  expected  according  to  a  theory  of  human  perception 
described  in  (Biederman,  1985).  Another  difficult  recognition  problem,  discriminating  between  cancerous 
and  non-cancerous  cells,  is  being  inspected,  and  initial  results  are  encouraging. 

In  summary,  the  opportunities  for  future  research  in  the  application  of  spatiotemporal  connectionist  net¬ 
works  to  hand-print  recognition,  as  well  as  to  other  visual  recognition  domains,  aire  plentiful.  The  described 
advantages  of  the  approach,  combined  with  its  demonstrated  success  to  hand-print  recognition,  warrant 
further  investigation. 
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Abstract 

We  map  structured  connectionist  models  of  knowledge  representation  ai.d  reasoning  onto  existing  gen¬ 
eral  purpose  massively  parallel  architectures  with  the  objective  of  developing  and  implementing  practical, 
rapid  or  real-time  reasoning  systems.  Shruti,  a  connectionist  knowledge  representation  and  reasoning 
system  which  attempts  to  model  reflexive  reasoning,  serves  as  our  representative  connectionist  model. 
Realizations  of  SHRUTI  are  developed  on  the  Connection  Machine  CM-2 — an  SIMD  architecture — and  on 
the  Connection  Machine  CM-5 — an  MIMD  architecture. 

Though  SIMD  implementations  on  the  CM-2  are  reasonably  fast — requiring  a  few  seconds  to  tens 
of  seconds  for  wswering  queries — experiments  indicate  that  SPMD  message  passing  systems  are  vastly 
superior  to  SIMD  systems  and  offer  hunOred-fold  speedups.  The  CM-5  implementation  can  encode  large 
knowledge  bases  with  sever^d  hundred  thousand  (randomly  generated)  rules  and  facts,  and  respond  in 
under  500  milliseconds  to  a  range  of  queiies  requiting  inference  depths  of  up  to  eight. 

This  work  provides  some  new  insights  into  the  simulation  of  structured  connectionist  networks  on 
massively  parallel  machines  and  is  a  step  toward  developing  large  yet  eflicient  knowledge  representation 
and  reasoning  systems. 


1  Introduction 


Connectionist  models  are  fast  developing  into  widely  explored  architectures  for  cognition  and  intelligence. 
These  models  use  a  large  number  of  simple  nodes  which  are  profusely  interconnected  by  direct  hard  wired 
links,  carrying  simple,  scalu  messages.  Massive  parallelism  is  an  important  feature  of  ^my  connectionist 
model.  Since  any  system  that  purports  to  model  human  cognition  must  use  some  form  of  massive  paral¬ 
lelism  if  it  has  to  resu:t  in  real-time  (Feldman  and  Ballard,  1982;  Shastri,  1991;  Newell,  1992),  structured 
connectionist  models — with  their  inherent  parallelism  and  their  ability  to  represent  structured  knowledge — 
seem  to  be  promising  architectures  for  high-level — or  symbolic — processing.  Several  structured  connectionist 
models  have  been  proposed  for  rule-based  reasoning,  language  processing,  planning  and  other  high-level  cog¬ 
nitive  processes  (Barnden  and  Pollack,  1991).  Prom  a  practicaJ  standpoint,  if  such  systems  have  to  be  fast, 
efficient  and  usable,  we  will  need  to  be  able  to  simulate  or  emulate  them  on  massively  parallel  platforms. 
From  a  cognitive  standpoint,  where  our  concern  is  to  design,  test  and  prototype  connectionist  models  of 
cognition,  we  would  require  suitable  platforms  for  implementing  and  experimenting  with  these  highly  par¬ 
allel  models.  Hence  mapping  connectionist  systems  onto  currently  existing  massively  parallel  architectures 
appears  to  be  an  avenue  worth  exploring. 

In  this  report  we  investigate  the  mapping  of  structured  connectionist  models  of  knowledge  representation 
and  reasoning  onto  existing  general  purpose  massively  parallel  architectures  with  the  objective  of  developing 
and  implementing  practical,  real-time  reasoning  systems.  We  define  rapid  or  real-time  reasoninj^  to  be 
reasoning  that  is  fast  enough  to  support  real-time  language  understanding.  We  can  understand  written 
language  at  the  rate  of  about  150-400  words  per  minute — i.e.,  we  can  understand  a  typical  sentence  in  a 
second  or  two  (Shastri  and  Ajjanagadde,  1993). 

Shruti,  a  connectionist  knowledge  representation  and  reasoning  system  which  attempts  to  model  reflexive 
reasoning  (Shastri  and  Ajjanagadde,  1993),  will  serve  as  our  representative  connectionist  model.  Efficient 
realizations  of  SHRUTi  are  developed  on  the  Connection  Machine  CM-2— an  SIMD  architecture — and  on  the 
Connection  Machine  CM-5 — an  MIMD  architecture.*  We  shall  use  the  term  parallel  rapid  reasoning  system 
to  designate  these  SHRUTl-based,  massively  parallel,  systems  that  can  handle  very  large  knowledge  bases  and 
perform  a  large  yet  limited  class  of  reasoning  in  real-time. 

Though  SIMD  implementations  on  the  CM-2  are  reasonably  fast — requiring  a  few  seconds  to  tens  of 
seconds  for  answering  simple  queries — experiments  indicate  that  SPMD  message  passing  systems  are  vastly 
superior  to  SIMD  systems  and  offer  hundred-fold  speedups.  The  CM-5  implementation  can  encode  large 
knowledge  bases  with  several  hundred  thousand  (randomly  generated)  rules  and  facts,  and  respond  in  under 
500  milliseconds  to  a  range  of  queries  requiring  inference  depths  of  up  to  eight. 

In  addition  to  developing  viable  tech »  logy  for  supporting  large-scale  knowledge  base  systems,  this  work 
provides  some  new  insights  into  the  -  ;i!uia  ion  of  structured  connectionist  networks  on  massively  parallel 
machines  and  is  a  step  toward  develop  ?  '  '  se  yet  efficient  knowledge  representation  and  reasoning  systems. 

Section  2  is  an  overview  of  the  system  described  in  the  rest  of  this  report.  Section  3  provides  a  brief 
description  of  SHRUTI,  our  representative  structured  connectionist  knowledge  representation  and  reasoning 
system.  Section  4  is  a  general  discussion  of  the  issues  involved  in  mapping  SHRUTI  onto  massively  parallel 
machines.  Section  6  deals  with  the  design,  implementation  and  characteristics  of  the  SPMD  parallel  rapid 
ri  isoning  system  on  the  CM-5.  Similar  issues  for  the  SIMD  CM-2  architecture  are  considered  in  Appendix  A. 


2  Overview  of  the  System 

The  parallel  rapid  reasoning  system  supports  the  encoding  of  very  large  knowledge  bases  and  their  use  for 
real-time  inference  and  retrieval.  Toward  this  end,  the  system  includes  the  following  suite  of  programs  and 

'Though  the  CM-5  it  an  MIMD  architecture,  it  can  only  be  used  in  SPMD  (Single  Program  Multiple  Data)  mode  with 
current  software.  See  Section  6. 
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tools; 


•  A  parser  for  accepting  knowledge-base  items  expressed  in  a  human  readable  input  language.  The 
language’s  syntax  is  similar  to  that  of  hrst-order  logic  (see  Appendix  D). 

•  A  preprocessor  for  mapping  a  knowledge  base  onto  the  underlying  parallel  machine.  This  involves 
mapping  the  knowledge  base  to  an  inferential  dependency  network  whose  structure  is  analogous  to 
that  of  SHRUTi,  and  partitioning  this  network  among  the  processors  of  the  parallel  machine. 

•  A  reasoning  algorithm  for  answering  queries.  This  runs  on  the  parallel  machine  and  efficiently  mimics 
the  reasoning  process  of  our  connectionist  models. 

•  Procedures  for  collecting  a  number  of  statistics  about  the  knowledge  base  and  the  reasoning  process. 
These  include  the  distribution  of  knowledge  base  items  among  processors,  the  processor  load  and 
message  traffic  during  query  answering,  and  a  count  of  knowledge  base  items  of  each  type  (rules,  facts, 
concepts,  etc.)  activated  during  processing. 

•  A  utility  for  generating  large  psuedo-random  knowledge  bases  given  a  specification  of  broad  structural 
constraints.  Examples  of  such  constraints  are:  the  number  of  knowledge  base  items  of  each  type,  any 
subdivision  of  the  knowledge  base  into  domains,  the  ratio  of  inter-  and  intra-domain  rules,  and  the 
depth  of  the  type  hierarchy. 

•  Several  tools  for  analyzing  and  visualizing  the  knowledge  base  and  the  statistics  gathered  during  query 
answering. 

This  collection  of  programs  and  tools  facilitates  automatic  loading  of  large  knowledge  bases,  incremental 
addition  of  items  to  an  existing  knowledge  base,  posing  of  queries  and  recording  of  answers,  and  off-line 
visualization  and  analysis  of  system  behavior.  It  also  allows  a  user  to  construct  large  artificial  knowledge 
bases  for  experimentation. 

The  system  is  interactive  and  allows  the  user  to  load  and  browse  knowledge  bases,  and  process  queries 
by  issuing  commands  at  a  prompt.  At  the  same  time  it  is  also  possible  to  process  command  files  and  use 
the  system  in  an  unattended  batch  processing  mode. 

3  Shruti — A  Connectionist  Reasoning  System 

Shruti,  a  connectionist  reasoning  system  that  can  represent  systematic  knowledge  involving  n-ary  predicates 
emd  variables,  has  been  proposed  by  Shastri  and  Ajjanagadde  (Shastri  and  Ajjanagadde,  1993;  Ajjanagadde 
and  Shastri,  1991).  Shruti  can  perform  a  broad  class  of  reasoning  with  extreme  efficiency.  The  time  taken 
by  the  reasoning  system  to  draw  an  inference  is  only  proportional  to  the  length  of  the  chain  of  inference  and 
is  independent  of  the  number  of  rules  and  facts  encoded  by  the  system.  The  reasoning  system  maintains 
and  propagates  variable  bindings  using  temporally  synchronous — i.e.,  in-phase — firing  of  appropriate  nodes. 
This  allows  the  system  to  maintain  and  propagate  a  large  number  of  variable  bindings  simultaneoualg  as  long 
as  the  number  of  distinct  entities  participating  in  the  bindings  during  any  given  episode  of  reasoning  remains 
bounded.  Reasoning  in  the  proposed  system  is  the  transient  but  systematic  flow  of  rhythmic  patterns  of 
activation,  where  each  phase  in  the  rhythmic  pattern  corresponds  to  a  distinct  entity  involved  in  the  reasoning 
process  and  where  vuiable  bindings  are  represented  as  the  synchronous  firing  of  appropriate  role  and  filler 
nodes.  A  fact  behaves  as  a  temporal  pattern  matcher  that  becomes  ‘active’  when  it  detects  that  the  bindings 
corresponding  to  the  fact  are  present  in  the  system’s  pattern  of  activity.  Finally,  rules  are  interconnection 
patterns  that  propagate  and  transform  rhythmic  patterns  of  activity. 

Shruti  attempts  to  model  reflexive  reasoning  over  a  large  body  of  knowledge.  Shruti  has  been  extended 
in  (Mani  and  Shastri,  1993)  to  effectively  reason  with  a  less  restricted  set  of  rules  and  facts  and  enhance  the 
system’s  ability  to  model  common-sense  reflexive  reasoning. 
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Figure  1:  (a)  An  example  encoding  of  rules  and  facta. 

stU(Mary,  Bookl )  9. 


(b)  Activation  trace  for  the  query  can- 


We  briefly  describe  the  reasoning  system  using  an  example.  Figure  la  illustrates  how  long-term  knowledge 
is  encoded  in  the  rule-based  reasoning  system.  The  network  encodes  the  following  rulto  and  facU: 

Vz.y.r  [  give(t,y,z)  =»  own(y,z)  ], 

Vt.y  [  buy(z,y)  ^  own(x,y)  ], 

Vr,y  (  own(z,y)  ^  can-sell(z,y)  ], 
give(  John,  Mary,  Bookl ), 
buy(John,zJ,  and 
own(Mary,BaUl). 

Rule  and  fact  encoding  madces  use  of  several  types  of  nodes  (see  Figure  2):  p-btu  nodes  (depicted  as  circles), 
r-and  nodes  (depicted  as  pentagons)  and  r-or  nodes  (depicted  as  triimgles).  These  nodes  have  the  following 
idealized  behavior:  On  receiving  a  spike  train,  a  ^btu  node  produces  a  spike  train  that  is  synchronous  (i.e., 
in-phase)  with  the  driving  input.  We  assume  that  p-btu  nodes  can  respond  in  this  manner  as  long  as  the 
inter-spike  distance,  ir,  lies  in  the  interval  [irmin,  ^mas]-  Here  n-mm  and  Vnax  are  the  minimum  and  maximum 
inter-spike  gaps  for  which  the  system  can  sustain  synchronous  activity  (Shastri  and  Ajjanagadde,  1993).  A 
r-and  node  behaves  like  a  temporal  ano  node,  and  becomes  active  on  receiving  an  uninterrupted  pulse  train. 
On  becoming  active,  a  r-and  node  produces  a  pulse  train  comparable  to  the  input  pulse  train.  A  r-or  node 
on  the  other  hand  becomes  active  on  receiving  any  activation;  its  output  is  a  pulse  whose  width  and  period 
equal  ir„ax-  Figure  2  summarizes  node  behavior.  The  encoding  also  makes  use  of  inhibitory  modifiers — links 
that  impinge  upon  and  inhibit  other  links.  A  pulse  propagating  along  an  inhibitory  modifier  will  block  a 
pulse  propagating  along  the  link  it  impinges  upon.  In  Figure  la,  inhibitory  modifiers  are  shown  as  links 
ending  in  dark  blobs. 

Each  entity  in  the  domain  is  encoded  by  a  p-btu  node.  An  n-ary  predicate  P  is  encoded  by  a  pair  of 
r-and  nodes  and  n  p-btu  nodes,  one  for  each  of  the  n  arguments.  One  of  the  r-and  nodes  is  referred  to  as  the 
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Figure  2:  Behavior  of  p-btu,  r-and  and  r-or  nodes  in  the  reasoning  system. 


enabler,  e:P,  and  the  other  as  the  collector,  c:P.  In  Figure  la,  enablers  point  upward  while  collectors  point 
downward.  The  enabler  e:P  becomes  active  whenever  the  system  is  being  queried  about  P.  On  the  other 
htind,  the  system  activates  the  collector  c:P  of  a  predicate  P  whenever  the  system  wants  to  assert  that  the 
current  dynamic  bindings  of  the  arguments  of  P  follow  from  the  knowledge  encoded  in  the  system.  A  rule  is 
encoded  by  connecting  the  collector  of  the  antecedent  predicate  to  the  collector  of  the  consequent  predicate, 
the  enabler  of  the  consequent  predicate  to  the  enabler  of  the  antecedent  predicate,  and  by  connecting  the 
arguments  of  the  consequent  predicate  to  the  arguments  of  the  antecedent  predicate  in  accordance  with  the 
correspondence  between  these  arguments  specified  in  the  rule.  A  fact  is  encoded  using  a  r-and  node  that 
receives  an  input  from  the  enabler  of  the  associated  predicate.  This  input  is  modified  by  inhibitory  modifiers 
from  the  argument  nodes  of  the  associated  predicate.  If  an  argument  is  bound  to  an  entity  in  the  fact  then 
the  modifier  from  such  an  argument  node  is  in  turn  modified  by  an  inhibitory  modifier  from  the  appropriate 
entity  node.  The  output  of  the  r-and  node  is  connected  to  the  collector  of  the  associated  predicate.  Figure  la 
shows  the  encoding  of  the  facts  give(John,Mary,Bookl)  and  buy(John,z).  The  fact  give(John,Mary,Bookl) 
states  that  ‘John  gave  Mary  Bookl’  while  huy(John,x)  implies  that  'John  bought  something’. 

3.1  The  Inference  Process 

Posing  a  query  to  the  system  involves  specifying  the  query  predicate  and  its  argument  bindings.  The  query 
predicate  is  specified  by  activating  its  enabler  with  a  pulse  train  of  width  and  periodicity  r.  Argument 
bindings  are  specified  by  activating  each  entity,  and  the  argument  nodes  bound  to  that  entity,  in  a  distinct 
phase,  phases  being  non-overlapping  time  intervals  within  a  period  of  oscillation. 

We  illustrate  the  reasoning  process  with  the  help  of  an  example.  Consider  the  query  can-sell(Mary, 
Bookl)?  (i.e.,  Can  Mary  sell  Bookl?)  The  query  is  posed  by  (i)  Activating  the  enabler  e:can-sell;  (ii) 
Activating  Mary  and  p-seller  in  the  same  phase  (say,  pi),  and  (iii)  Activating  Bookl  and  cs-obj  in  some  other 
phase  (say,  p^).  As  a  result  of  these  inputs,  Mary  and  p-seller  fin  synchronously  in  phase  pi  of  every  period 
of  oscillation,  while  Bookl  and  cs-obj  fire  synchronously  in  phase  ps.  See  Figure  lb.  The  activation  from  the 
can-sell  predicate  propagates  to  the  own,  give  and  bvy  predicates  via  the  links  encoding  the  rules.  Eventually, 
as  shown  in  Figure  lb,  Mary,  p-seller,  owner,  buyer  amd  recip  will  all  be  active  in  phase  pi ,  while  Bookl, 
cs-obj,  o-obj,  g-obj  and  b-obj  would  be  active  in  phase  pj.  The  activation  of  e.can-sell  causes  the  enablers  of 
all  other  predicates  to  go  active.  In  effect,  the  system  is  asking  itself  three  more  queries — own(Mary, Bookl)?, 
give(x, Mary, Bookl)?  (i.e..  Did  someone  give  Mary  Bookl?),  and  buy(Mary,Bookl)?.  The  r-and  node  FI, 
associated  with  the  fact  give( John, Mary, Bookl)  becomes  active  as  a  result  of  the  uninterrupted  activation  it 
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receives  from  e.gtve,  thereby  answering  g%vt(x,MaTg,Bookl)9  affirmatively.  The  activation  from  FI  spreads 
downward  to  c:gtve,  c.  own  and  c:can-aeU.  Activation  of  c;can>se{/ signals  an  affirmative  answer  to  the  original 
query  can-sell(Afary,BookJ)?. 


3.2  The  Type  Hierarchy 

Integrating  a  type  hierarchy  with  the  reasoning  system  (Mani  and  Shastri,  1993)  2dlows  the  use  of  types 
(categories)  as  well  as  instances  in  rules,  facts,  and  queries.  This  has  the  following  consequences; 

•  The  reasoning  system  cm  combine  rule-based  reasoning  with  tnhertiance  and  claastficaiton.  For  exam¬ 
ple,  such  a  system  can  infer  that  ‘Tweety  is  scared  of  Sylvester’,  based  on  the  generic  fact  ‘cats  prey 
on  birds’,  the  rule  ‘if  x  preys  on  y  then  y  is  scared  of  z’  and  the  is-a  relations  ‘Sylvester  is  a  cat’  and 
‘Tweety  is  a  bird’. 

•  The  integrated  system  can  use  category  information  to  qualify  rules  by  specifying  restrictions  on  the 
type  of  argument  fillers.  An  example  of  such  a  rule  is; 

'ix.antmaie,  y:aolid-obj  [  walk-tnio(x,y)  =>  hurt(x)  ] 

which  specifies  that  the  rule  is  applicable  only  if  the  two  arguments  of  ‘walk-into’  are  of  the  type 
‘animate’  and  ‘solid-object’,  respectively. 

Each  entity  (concept  or  instance)  is  now  represented  as  a  cluster  of  nodes  and  is  associated  with  two 
type-httrarchy  awiichet—h  top-down  T-switch  and  a  bottom-up  T-switch.  Any  entity  can  now  accommodate 
up  to  ki  dynamic  instantiations,  ki  being  the  multiple  instantiation  constant  for  concepts.  The  T-switches 
regulate  the  flow  of  activation  so  as  to  ensure  efficient  and  automatic  dynamic  allocations  of  instantiations. 

3.3  Multiple  Dynamic  Instantiation  of  Predicates 

Extending  the  reasoning  system  to  incorporate  multiple  instantiation  of  predicates  (Mani  and  Shastri,  1993) 
provides  SHRUTI  with  the  ability  to  atmultaneoualy  represent  multiple  dynamic  facts  about  a  predicate.  For 
example,  the  dynamic  facts  lovea(John,Mary)  and  lovea(Afary,Tom)  cm  now  be  represented  at  the  aame 
time.  As  a  result,  we  can  represent  and  reason  using  a  set  of  rules  which  cause  a  predicate  to  be  instantiated 
more  than  once.  We  can  now  encode  rules  like: 

Vz,y  [  aibling(x,y)  =>  aibling(y,x)  ]  and 

^x,y,z  [  greater-than(x,y)  A  greater-tkan(y,z)  greater-than(x,z)] 

thereby  introducing  the  capability  to  handle  limited  symmetry,  transitivity  and  recursion. 

Introduction  of  multiple  dynamic  instantiation  of  predicates  relies  on  the  assumption  that,  during  an 
episode  of  reflexive  reasoning,  any  given  predicate  need  only  be  instantiated  a  bounded  number  of  times.  In 
(Shastri  and  Ajjanagadde,  1993),  it  is  argued  that  a  reasonable  value  for  this  bound  is  around  three.  We 
shall  refer  to  this  bound  as  the  multiple  instantiation  constant  for  predicates,  k2-^ 

Predicate  representations  are  augmented  so  that  each  predicate  can  represent  up  to  kj  dynamic  instan¬ 
tiations.  Each  predicate  also  has  m  associated  multiple  inatantiation  awitch  (or  M-switch)  through  which 
all  inputs  to  the  predicate  nodes  are  routed.  The  switch  arbitrates  the  input  and  brings  about  efficient  and 
automatic  dynamic  allocation  of  predicate  instantiations. 

^Thif  is  the  factor  that  limits  symmetry,  transitivity  and  recursion,  since  each  predicate  can  accommodate  at  most  k2 
dynamic  instantiations. 
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4  Mapping  shruti  onto  Massively  Parallel  Machines 

When  mapping  SHRUTI  onto  any  maasively  parallel  machine,  several  issues  need  to  be  taken  into  consideration 
in  order  to  obtain  effective  performance  and  to  strike  a  compromise  between  resource  usage  and  response  time. 
Several  of  these  issues  are  discussed  here.  The  discussion  here  is  applicable  when  mapping  SHRUTI  onto  any 
massively  parallel  machine.  Later  sections  bring  out  how  these  issues  are  resolved  in  actual  implementations 
on  the  CM-2  amd  CM-5.  We  have  chosen  the  CM-2  and  CM-5  as  our  target  machines  since  they  are 
representatives  of  their  class  and  offer  similar  user  interfaces  and  program  development  environments. 


4.1  Exploiting  Constraints  Imposed  by  Shruti 

As  brought  out  in  the  previous  sections,  SHRUTI  is  a  limited  inference  system,  and  imposes  several  psycho¬ 
logically  and/or  biologically  motivated  constraints  in  order  to  make  reasoning  tractable; 

•  The  number  of  distinct  entities  that  can  participate  in  an  episode  of  reasoning  is  bounded. 

•  Entities  and  predicates  can  only  represent  a  limited  number  of  dynamic  instantiations. 

•  The  form  of  rules  and  facts  that  can  be  encoded  is  constrained. 

•  The  depth  of  inference  is  bounded. 

The  motivation  for  these  constraints  and  their  impact  are  discussed  in  (Shastri  and  Ajjanagadde,  1993).  In 
terms  of  mapping  shruti  onto  parallel  machines,  it  would  be  to  our  advantage  to  exploit  these  constraints 
to  the  fullest  extent  to  obtain  efficiency,  speed  and  rapid  response  with  large  knowledge  bases.  Of  course,  if 
any  of  these  constraints  can  be  relaxed  without  paying  a  severe  performance  penalty,  we  would  like  to  obtain 
a  more  powerful  system  by  relaxing  these  constraints. 

4.2  Granularity 

For  effective  mapping,  the  SHRUTI  network  encoding  a  knowledge  base  must  be  partitioned  among  the 
processors  in  the  machine.  The  network  partitioning  can  be  specified  at  different  levels  of  grtmularity.  At 
the  fine-grained  network-level,  the  partitioning  would  be  at  the  level  of  the  basic  nodes  and  links  constituting 
the  network.  A  more  coarse-grained  knowledge-level  mapping  would  partition  the  network  at  the  level  of 
knowledge  elements  like  predicates,  concepts,  facts,  rules  and  ts-a  relations. 

The  appropriate  level  of  granularity  for  a  given  situation  depends  on  several  factors  including  the  char¬ 
acteristics  of  the  network,  the  processing  power  of  individual  processors  on  the  machine  and  interprocessor 
communication  mechanisms. 

4.3  Network-Level  Mapping 

At  this  fine-grained  level  of  granularity,  the  network  is  viewed  as  a  collection  of  nodes  and  links.  Factors 
that  need  to  be  taken  into  consideration  when  using  network-level  partitioning  include; 

Processor  Allocation  Nodes  and  links  in  the  network  should  be  assigned  to  processors  on  the  target 
machine  so  as  to  minimize  response  time.  Several  options  are  possible;  Each  node  and  link  could  be 
a.ssigned  to  a  separate  processor;  groups  of  nodes  and/or  links  could  be  assigned  to  a  single  processor; 
processors  could  be  partitioned  so  that  some  handle  only  nodes  and  some  handle  only  links;  and  so  on. 

Nodes  The  network  which  SHRUTI  uses  to  encode  a  knowledge  base  consists  of  several  different  types  of 
nodes.  A  given  processor  could  handle  only  one  type  of  node  or  could  simulate  an  assorted  combination 
of  node  types.  The  complexity  of  the  node  function  should  also  be  taken  into  consideration. 
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Links  Like  nodes,  the  links  can  also  be  of  several  types — including  weighted,  unweighted  and  inhibitory 
links.  Placement  of  the  links  (on  processors)  relative  to  the  placement  of  the  nodes  they  connect  is 
important  since  this  is  a  major  factor  determining  the  volume  of  interprocessor  communication. 

Communication  and  Computation  The  partitioning  scheme  used  to  assign  network  components  to  pro¬ 
cessing  elements  should  take  into  account  the  balance  of  computation  and  communication  in  the  re¬ 
sulting  system.  Communication  between  network  nodes,  and  hence  interprocessor  communication, 
is  an  essential  aspect  of  connectionist  network  simulation.  Trying  to  eliminate  or  unduly  minimize 
interprocessor  communication  could  lead  to  severe  load  imbalances  whereby  a  few  of  the  processing 
elements  are  overburdened  with  computation.  IVying  to  evenly  spread  the  computational  load  among 
the  processing  elements  could  result  in  increased  communication  and  poor  performance.  A  well  de¬ 
signed  system  should  strike  a  compromise  between  communication  and  computation  so  as  to  achieve 
effective  performance. 

4.4  Knowledge-Level  Mapping 

Knowledge-level  mapping  views  the  network  at  a  relatively  abstract  level.  At  this  granularity,  knowledge 
base  elements  like  predicates,  concepts,  facts,  rules  and  is-a  relations  form  the  primitives.  As  is  evident  from 
Section  3,  each  primitive  is  constituted  by  a  group  of  nodes  and/or  links.  The  behavior  of  these  primitives 
is  directly  simulated  without  recourse  to  the  underlying  nodes  and  links  constituting  the  primitive.  Issues 
at  this  level  include: 

Predicates  Each  predicate  could  be  assigned  to  a  separate  processor,  or  a  group  of  predicates  could  be 
assigned  to  a  single  processor.  In  the  latter  case,  predicates  constituting  a  rule  could  all  be  placed  on 
the  same  processor  or  could  be  scattered  on  different  processors.  Grouping  predicates  on  any  given 
processor  could  reduce  the  number  of  messages  required  to  spread  activation,  but  would  make  load 
balancing  more  difficult. 

Facts  Facts  could  be  stored  on  the  same  processors  to  which  the  corresponding  fact  predicates  have  been 
assigned.  An  alternative  approach  would  be  to  have  dedicated  processors  for  encoding  facts.  Such 
processors  will  receive  inputs  from  both  the  fact  predicate  and  the  type  hieraurchy,  and  will  signal  fact 
matches  globally  or  by  communicating  with  the  processor  containing  the  predicate  under  consideration. 
In  any  case,  we  may  need  some  mechanism  to  circumvent  the  situation  where  processors  run  out  of 
memory  since  predicates  could  have  a  large  number  of  associated  facts. 

Concepts  Concept  clusters  are  used  in  the  type  hierarchy  to  represent  types  and  instances.  Apart  for  being 
linked  up  to  form  the  type  hierarchy,  these  clusters  must  also  con.municate  with  the  rule  base.  Careful 
choice  of  the  mechanisms  used  to  communicate  the  firing  phase  of  concepts  to  the  rule  base  could  make 
the  system  more  effective  and  reduce  the  number  of  messages  exchanged  in  the  system. 

Rules  When  encoding  rules,  effective  placement  of  predicates  can  minimize  communication  costs.  The 
arbitration  mechanism  for  accommodating  multiple  instantiations  of  a  predicate  also  needs  to  be  taken 
into  account. 

•  When  encoding  rules,  there  are  several  choices  available  for  the  placement  of  predicates  consti¬ 
tuting  the  rule: 

-  Depending  on  the  processor  allocation  scheme  used,  we  could  allocate  predicates  occurring 
in  a  rule  to  the  same  processor.  This  would  reduce  interprocessor  communication  since  fewer 
messages  are  required  when  the  rule  fires.  This  may  not  be  easy  to  accomplish  if  predicates 
present  in  the  rule  being  encoded  have  already  been  assigned  to  different  processors. 

-  A  weaker  form  of  the  above  scheme  is  to  allocate  predicates  in  a  rule  locally — i.e.,  on  nearby 
processors.  This  scheme  is  easier  to  execute  but  will  require  relatively  more  messages  in  order 
to  fire  a  rule. 


7 


-  The  other  extreme  is  to  scatter  the  predicates  randomly.  Though  this  would  require  more 
messages,  and  messages  would  travel  an  average  distance  longer  than  for  the  previous  two 
schemes,  there  are  indications  that  random  allocation  may  distribute  messages  uniformly  over 
ihe  entire  machine  instead  of  localizing  it  to  “hot  spots”  where  all  the  action  happens,  and 
would  therefore  reduce  the  incidence  of  message  collisions  (Leighton,  1992).  Further,  this 
scheme  would  provide  better  load  balancing  when  answering  a  query. 

•  Identifying  suitable  performance  measures  and  attempting  to  optimize  these  will  aid  in  the  ob¬ 
jective  placement  of  predicates  when  encoding  rules.  The  performance  measure  could  take  into 
account  factors  like  load  bsdancing,  cost  of  computation  and  communication,  etc.  It  should  be 
easy  to  compute  the  measure — or  at  least  approximate  it — using  only  local  information. 

•  Predicate  instance  arbitration  mechanisms  (“switches”)  may  need  to  be  redesigned.  When  one  or 
more  predicates  are  assigned  to  each  processor,  switches  may  be  unnecessary.  Space  (“banks”) 
can  be  allocated  for  instances  of  each  predicate.  Incoming  activation  can  be  received  in  a  buffer 
and  then  allocated  to  an  empty  bank  under  program  control. 

Type  Hierarchy  Most  of  the  issues  raised  above  will  also  need  to  be  reconsidered  with  respect  to  the 
location  and  interaction  of  concepts  in  the  type  hierarchy.  We  would  also  need  to  streamline  the 
interaction  between  the  type  hierarchy  and  the  rule  base  for  enhanced  efficiency  and  effectiveness. 
Extending  the  scheme  mentioned  above  for  dealing  with  multiple  instantiation,  we  might  be  able  to  do 
away  with  the  type  hierarchy  T-switch. 

It  should  be  evident  that  most  of  the  concerns  addressed  above  are  intertwined  in  that  choosing  one 
aspect  will  affect  the  choice  of  other  aspects  of  the  mapping.  On  a  global  scale,  our  aim  is  to  develop  an 
efficient  and  effective  mapping  by  ensuring  load  balancing,  minimizing  interprocessor  communication  and  by 
efficiently  using  resources  including  processors  and  memory. 

Further,  we  believe  that  knowledge-level  partitioning  is  the  appropriate  granularity  for  both  the  CM-2 
and  CM-5.  The  processing  elements  on  the  CM-2  are  reasonably  powerful  (Appendix  A)  while  the  processing 
elements  on  the  CM-5  (Section  6)  are  full-fledged  SPARC  processors.  Thus,  subnetworks  corresponding  to 
knowledge-level  primitives  can  be  implemented  using  appropriate  data  structures  and  associated  procedures 
without  necessarily  mimicking  the  detailed  behavior  of  individual  nodes  and  links  in  the  subnetwork. 

5  Shruti  on  the  CM-2 

Initially  we  developed  Shruti-cm2,  a  data  parallel  implementation  of  shruti  on  the  Connection  Machine 
CM-2  (TMC,  1991a).  A  detailed  description  of  shruti-cm2,  including  design,  knowledge  encoding,  spreading 
activation  and  performance  characteristics  cm  be  found  in  Appendix  A.  However,  due  to  the  overwhelmingly 
superior  performance  of  the  SPMD  implementation  on  the  CM-5  (Section  6),  the  SHRUTl-CM2project  was 
abandoned.  Figure  3  compares  the  performance  of  the  CM-2  and  CM-5  parallel  rapid  reasoning  systems. 

6  Shruti  on  the  CM-5 

The  Connection  Machine  model  CM-5  (TMC,  1991b)  is  an  MIMD  machine  consisting  of  anywhere  from 
32  to  1024  powerful  processors.^  Each  processing  node  is  a  general-purpose  computer  which  cm  execute 
instructions  autonomously  and  perform  interprocessor  communication.  Each  processor  can  have  up  to  32 
megabytes  of  loceil  memory^  and  optional  vector  processing  hardware.  The  processors  constitute  the  leaves 
of  a  fat  tret  interconnection  network,  where  the  bandwidth  increases  as  one  approaches  the  root  of  the  tree. 

^In  principle,  the  CM-5  architecture  can  support  up  to  16K  processors. 

^The  amount  of  local  memory  is  baaed  on  4-Mbit  DRAM  technology  and  will  increase  as  dram  densities  increase. 


QUERY  DEPTH  y*.  RESPONSE  TIME 


Figure  3:  A  comparison  of  SHRUTI-CM2  running  on  32K,  16K  and  8K  processor  CM-2  machines  and  SHRUTi- 
CM5  running  on  a  32  PE  CM-5.  The  same  full-fledged,  structured,  random  knowledge  base  with  special 
rules  and  a  type  hierarchy  was  used  on  all  the  machines.  Note  that  the  timing  curve  for  the  CM-5  has  been 
multiplied  by  100.  Queries  used  were  not  randomly  generated. 


Every  CM-5  system  may  have  one  or  more  control  processors  which  are  similar  to  the  processing  nc  les  but  are 
specialized  to  perform  managerial  and  diagnostic  functions.  A  low-latency  control  network  provides  tightly 
coupled  communications  including  synchronization,  broadcasting,  global  reduction  and  scan  operations.  A 
high  bandwidth  data  netwc'k  provides  loosely  coupled  interprocessor  communication.  A  standard  network 
interface  connects  nodes  and  l/o  units  to  the  control  and  data  networks  The  virtual  machine  emerging  from 
a  combination  of  the  hardware  and  operating  system  consists  of  a  control  processor  acting  as  a  partition 
manager,  a  set  of  processing  nodes,  facilities  for  interprocessor  communication  and  a  UNiX-like  programming 
interface.  A  typical  user  task  consists  of  a  process  running  on  the  partition  manager  and  a  process  running 
on  each  of  the  processing  nodes. 

Though  the  basic  architecture  of  the  CM-5  supports  MIMD  style  programming,  operating  system  and 
other  software  constraints  restrict  users  to  SPMD  (Single  Program  Multiple  Data)  style  programs  (TMC, 
1994).  In  SPMD  operation,  a  single  program  runs  on  all  the  processors,  each  acting  on  its  share  of  data 
items.  Both  data  parallel  (SIMD)  and  message-passing  programming  on  the  CM-5  use  the  SPMD  model. 
If  the  user  program  takes  a  primarily  global  view  of  the  system — with  a  global  address  space  and  a  single 
thread  of  control — and  processors  run  in  synchrony,  the  operation  is  data  parallel;  if  the  program  enforces  a 
local,  node-level  view  of  the  system  and  processors  function  asynchronously,  the  machine  is  used  in  a  more 
MIMD  fashion.  We  shall  consistently  use  “SPMD”  to  be  synonymous  with  the  latter  mode  of  operation.  In 
this  mode,  all  communication,  synchronization  and  data  layout  are  under  the  programs’  explicit  control. 

In  this  section  we  describe  the  design  and  implementation  of  the  SPMD  asynchronous  message  passing 
parallel  rapid  reasoning  system — SHRUTI-CM5 — that  has  been  developed  for  the  CM-5.® 

^oHRUTI-CM2.  the  SIMD  parallel  rapid  reaaoning  system  for  the  CM-2,  caa  also  be  run  on  the  CM-S.  Results  of  these 
experiments  are  described  in  Appendix  C 
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6.1  Design  Considerations 

Granularity  of  Mapping 

The  individual  processing  elements  on  the  CM-5  are  powerful  processors  and  therefore  a  subnetwork  in  the 
connectionist  model  can  be  implemented  on  a  processor  using  appropriate  data  structures  and  associated 
procedures  without  necessarily  mimicking  the  detailed  behavior  of  individual  nodes  and  links  in  the  subnet¬ 
work.  This  entails  that  knowledge-level  partitioning  (Section  4)  is  the  appropriate  granularity  for  mapping 
SHRUTi  onto  the  CM-5. 

Active  Messages  and  Communication 

Shruti-cm5  uses  CMMD  library  functions  (TMC,  1993)  for  broadcasting  and  synchronization,  while  almost 
all  interprocessor  communication  is  achieved  using  CMAML  (CM  Active  Message  Library)  routines. 

CMAML  provides  efficient,  low-latency  interprocessor  communication  for  short  messages  (TMC,  1993; 
Eicken  et  al.,  1992).  Active  messages  are  asynchronous  (non-blocking)  and  have  very  low  communication 
overhead.  A  processor  can  send  off  an  active  message  and  continue  processing  without  having  to  wait  for  the 
message  to  be  delivered  to  its  destination.  When  the  message  arrives  at  the  destination,  a  handler  procedure 
is  automatically  invoked  to  process  the  message.  The  use  of  active  messages  improves  communication 
performance  by  about  an  order  of  magnitude  compared  with  the  usual  send/receive  protocol.  The  main 
restriction  on  such  messages  is  their  size — they  can  only  carry  16  bytes  of  information.  However,  given 
the  constraints  on  the  number  of  entities  involved  in  dynamic  bindings  (i%s  10),  there  is  an  excellent  match 
between  the  size  of  an  active  message  and  the  amount  of  variable  binding  information  that  needs  to  be 
communicated  between  predicate  instances  during  reasoning  as  indicated  by  shruti.  Shruti-cm5  exploits 
this  match  to  the  fullest  extent. 

6.2  Encoding  the  Knowledge  Base 

In  the  SHRUTI-CM5  system,  the  knowledge  base  is  encoded  by  presenting  rules  and  facts  expressed  in  a  human 
readable,  first-order  logic-like  syntax  specified  in  Appendix  D.  The  commands  recognized  by  shruti-cm5 
are  described  in  Appendix  E. 

Input  Processing 

Knowledge  encoding  in  SHRUTl-CMd  is  a  two-part  process: 

1.  Serial  preprocessing.  A  serial  preprocessor  running  on  a  workstation  processes  the  input  knowl¬ 
edge  base  and  partitions  it  into  as  many  chunks  as  there  are  processors  on  the  CM-5  partition.  The 
preprocessor  outputs  a  set  of  files  which  are  subsequently  read  by  the  CM-5  in  parallel. 

2.  Parallel  knowledge  base  encoding.  Each  processor  on  the  CM-5  independently  and  asynchronously 
encodes  the  fragment  of  the  knowledge  structure  assigned  to  it  by  the  preprocessor.  Depending  on 
the  processor  assignment  scheme  used,  each  processor  on  a  n  processor  CM-5  would  typically  need  to 
process  only  —th  of  the  entire  input  knowledge  base. 

This  two-part,  asynchronous  parrdlel  input  processing  is  well  suited  for  large-scale  knowledge  bases.  In 
addition,  Shruti-cm5  also  provides  a  direct  input  mode.  In  this  mode,  the  processors  cooperatively  and 
synchronously  encode  the  knowledge  base.  This  mode  can  be  used  to  by-pass  serial  preprocessing  and  is 
useful  when  small  knowledge  base  fragments  need  to  be  added  to  an  existing  (large)  knowledge  base.  Shruti- 
cm5  also  supports  convenient  and  consistent  parallel  updating  of  large  knowledge  bases  via  incremental 
preprocessing. 


typedfll  struct  cm.prsdbank  /*  pradicats  bank  on  tha  CM  •/ 

{ 

/a  no  lialds  uaad  to  ancoda  KB  «/ 


arg  activation  phasa  */ 
dapth  ol  raasoning  chain 
ahich  makas  c:  activa  */ 


pradicata  on  tha  CM  */ 


byta  collector; 
byte  enabler; 

byte  arga [MAX. ARCS] ; 

/* 

char  qOepth; 

/* 

}  CN.PredBank; 

typedef  struct  ca_pred 

/• 

i 

byte 

noOf Args ; 

struct  ca_list 

erules ; 

/• 

struct  ca_li8t 

ef acta ; 

/* 

byte 

nextFraa ; 

/* 

struct  ca_predbank  bankCX2] ; 

/♦ 

struct  CB_liat 
>  CM.Pred; 

«ruleBPtrCK2] ; 

/* 

list  ot  rulas  vith  prsd  as  conssq  */ 
list  of  facts  for  prad  */ 

indaz  of  nazt  fraa  bank  (minst)  */ 
pradicata  banks  */ 

rula  back-pointars  (for  c:  activation)  */ 


Figure  4:  Structures  used  to  represent  predicates  in  shruti-cm5.  MAX-ARGS  is  the  maximum  number  of 
arguments  a  predicate  can  have.  K2  is  the  multiple  instantiation  constant  for  predicates.  The  top  part  of 
the  t]rp«daf8  contain  fields  used  to  encode  the  knowledge  base  while  the  bottom  part  has  fields  used  in  a 
given  episode  of  reasoning. 


In  either  of  the  input  modes,  the  knowledge  base  is  scanned  by  a  lexical  analyzer  and  pairser.  Parsing 
the  input  results  in  the  construction  of  internal  data  structures  which  represent  the  input  presented  to  the 
system.  A  specially  designated  server  processor  builds  bash  tables  which  keep  track  of  processor  assignments. 
Whenever  the  system  needs  to  know  which  processor  houses  some  predicate  P,  the  server  broadcasts  the 
required  information.  The  system  is  designed  in  such  a  manner  that  the  server  docs  not  become  a  bottleneck 
during  the  reasoning  process.  Information  from  the  server  is  needed  only  when  posing  a  query.®  Once  a 
query  has  been  posed,  the  system  data  structures  are  so  configured  that  spreading  activation  will  proceed 
without  the  need  for  any  information  from  the  server.  Maintaining  a  server  processor  therefore  does  not 
afiect  inference  timing  in  any  way. 

Once  a  rule  or  fact  (including  an  is-a  relation)  has  been  recognized  and  processed,  the  resulting  internal 
data  structures  will  be  used  to  encode  the  rule  or  fact  on  the  Connection  Machine  processors.  In  the  case  of 
a  query,  the  data  structures  will  be  used  to  pose  the  query  to  the  system. 

Representing  Knowledge  Base  Elements 

Knowledge  base  elements — predicates,  concepts,  rules,  facts  and  is-a  relations — are  assigned  to  a  processor 
where  the  knowledge  base  elements  are  represented  using  suitable  structures.  Any  knowledge  base  element 
is  allocated  space  on  exactly  one  processor.  Figures  4-7  define  the  structures  used  to  encode  knowledge  base 
elements.  All  processors  in  the  partition  except  the  server  can  encode  knowledge  base  elements.  The  SHRUTI 
network  is  internally  encoded  by  a  series  of  pointers  which  serve  to  link  predicate  and  concept  representations. 

*The  server  is  also  accessed  when  encoding  knowledge  in  synchronous  direct  input  mode. 
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t3rp«del  struct  cm_ruls  /*  ruls  slot  on  ths  CM  */ 

/*  knoulsdgs  bass  sncoding 

struct  cm_antllst  *antscsdsnt;  /*  list  of  antscsdsnt  prsdicatss  */ 

struct  cm_list  sconssqusnt;  /*  conssqusnt  prsdicats  */ 

byts  noOiAnts;  /*  nuabsr  ot  ant  prsdicatss  for  ruls  */ 

int  ssight;  /*  usight;  currsntly  unussd  */ 

byts  splCond [MAX. ARCS] ;  /*  list  of  spscial  conditions  «/ 

int  splindsx [MAX. ARCS] ;/*  procs  containing  spl  cond  constants  */ 

indsz  splPtr [MAX. ARCS] :  /*  ptr  to  spl  cond  constants  «/ 

/*  reasoning  spisods  */ 
byts  conssqCollsctor CK2] ; 


char  qOspthCKS] ; 

}  CK.Ruls; 

typsdsl  struct  ca_lact  /*  fact  on  ths  CM  */ 

struct  cm_prsd  sfactPrsd;  /*  fact  prsdicats  •/ 

indsz  constant [MAX.ARGS] ;  /*  fact  argunsnt  pointers  */ 

indsz  constLocation [MAX.ARGS] ;  /*  proc  containing  const  */ 

bool  active ;  /*  fact  active  if  set  */ 

}  CM.Fact; 

Figure  5:  Structures  used  to  encode  rules  and  facts  in  shruti-cm5.  MAX-ARGS  is  the  maximum  number  of 
^lrguments  a  predicate  can  have.  K2  is  the  multiple  instantiation  constant  for  predicates.  Processor  indices 
have  type  indsz  and  flags  have  type  bool.  Pointers  are  also  of  type  indsz  and  index  into  local  translation 
tables  on  the  respective  processors.  The  top  part  of  the  typsdsfs  contain  fields  used  to  encode  the  knowledge 
base  while  the  bottom  part  has  fields  used  in  a  given  episode  of  reasoning. 


/*  c:  valuss  for  the  conssq  prod  ars 
accuAulatsd  hsrs;  roqd  for  supporting 
nultipls  antscsdsnt  rulss  */ 

/*  rsasoning  chain  dspth;  roqd  for  nultipls 
antscsdsnt  rulss  •/ 
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typadal  struct  cm_«ntit7baiik  /«  entity  bank  on  the  CH  •/ 

{ 

/*  no  fields  used  to  encode  XB  */ 


bool  buRelay ; 
bool  tdRelay ; 
byte  activation; 
}  CN_EntityBank; 


byte  nextFree; 

atruct  ca_entitybank  bankCKl] ; 
>  CM.Entity: 


/*  bottom-up  relay  •/ 

/•  top-down  relay  •/ 

/*  entity  activation  phase  •/ 


/*  entity  on  the  CM  */ 

/e  bottom-up  links  */ 

/•  top-down  links  */ 

/*  index  of  next  free  bank  */ 
/*  entity  banks  •/ 


typedef  struct  cm.entity 

struct  cm_list  esuperConcepts ; 

struct  cm_llst  esubConcepts; 


Figure  6:  Structures  used  to  represent  entities  in  the  type  hierarchy  (in  shruti-cmS).  K1  is  the  multiple 
instantiation  constant  for  concepts  in  the  type  hierarchy.  Flags  have  type  bool.  The  top  part  of  the  typedefs 
contain  fields  used  to  encode  the  knowledge  base  while  the  bottom  part  has  fields  used  in  a  given  episode  of 
reasoning. 


typsdsf  struct  cm.isalink  /*  is-a  links  on  the  CM  */ 

index  destination;  /*  index  of  destination  proc 

index  concept;  /*  destination  concept  */ 

!*  no  fields  used  during  reasoning  episode  */ 

}  CN.isALink; 


Figure  7:  Structure  used  to  encode  ts-a  relationships  in  shruti-cm5.  Processor  indices  have  type  index. 
Pointers  are  also  of  type  index  and  index  into  local  translation  tables  on  the  respective  processors.  The  top 
part  of  the  typedef  contains  fields  used  to  encode  the  knowledge  base  while  the  bottom  part  has  fields  used 
in  a  given  episode  of  reasoning. 
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initialize  global  atatiatica  collection  variablea; 

while  (ternination  condition  not  aiet)  < 

t*  propagate  activation  in  the  type  hierarchy  •/ 
apread  botton-up  activation; 
apread  top-down  activation; 

/e  propagate  activation  in  the  rule  baae  */ 
back-propagate  collector  activation; 
check  fact  matchea; 
propagate  rule  activation; 

update  atatiatica  collection  variablea; 

> 


Figure  8:  The  main  propagation  loop  used  in  spreading  activation  during  an  episode  of  reasoning.  The 
termination  condition  is  met  when  the  query  is  answered  or  the  system  determines  that  the  query  has  no 
answer.  Note  that  the  order  of  the  operations  is  crucial  while  propagating  rule  baae  activation.  Activation 
of  predicates  whose  collectors  became  active  in  the  previous  step  must  be  back-propagated  before  facts  are 
matched,  since  fact  matching  could  activate  other  predicate  collectors  whose  activation  should  be  spread  in 
the  next  propagation  step.  Further,  fact  matching  for  predicates  that  became  active  in  the  previous  step 
must  occur  before  new  rules  are  fired,  since  firing  rules  could  activate  more  predicates  and  fact  matches  for 
these  predicates  should  be  checked  in  the  next  iteration. 


Unlike  a  serial  machine,  a  “pointer"  on  the  CM-5  would  need  both  a  memory  address  and  the  index  of  the 
processor  to  which  the  required  fragment  of  memory  belongs.  In  order  to  support  parallel  knowledge  baae 
encoding,  the  “memory  addresses”  are  indirect  smd  index  into  translation  tables  on  the  respective  proceaaors. 


Encoding  Rules  and  Facts 

Depending  on  the  processor  allocation  scheme  (Section  4),  every  predicate  and  concept  appearing  in  the 
knowledge  base  will  be  assigned  to  a  processing  node  on  the  CM-5.  Further,  a  rule  or  fact  (including  an 
is-a  relation)  that  is  being  encoded  will  also  be  assigned  to  a  processor.  The  actual  details  of  the  processor 
allocation  are  dictated  by  the  processor  assignment  scheme  being  used.  The  shruti-cm5  design  offers  several 
options  for  processor  assignment  schemes.  Shruti-cm5  implementations  use  random  processor  assignment 
for  predicates  and  concepts.  Facts  and  is-a  links  are  encoded  on  the  processors  containing  the  relevant 
predicate  or  concept^  and  rules  were  encoded  on  the  processor  containing  the  consequent  predicate.  Any 
processor  in  the  machine  (except  the  server)  can  have  both  predicates  and  concepts  assigned  to  it. 

Once  the  predicates,  concepts  and  other  knowledge  base  elements  under  consideration  are  assigned  to 
processing  elements  on  the  CM-5,  the  knowledge  base  structures  are  built  and/or  updated.  Rules,  facts,  and 
15- a  links  are  encoded  by  a  series  of  pointers  which  link  predicate  and  concept  representations  to  form  the 
entire  network. 

^AHigning  fscu  (m-s  link*)  to  the  proceuor  houeins  the  aaeociated  predicate  (concept)  could  result  in  deteriorating  per¬ 
formance  if  the  distribution  of  facts  (is-s  relations)  is  skewed — i.e.,  a  few  predicates  (concepts)  have  a  disproportionately  large 
number  of  facts  (is-s  relations).  Under  such  situations,  other  schemes  such  as  splitting  facts  (ts-s  links)  across  proceasmv  may 
have  to  be  considered. 
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6.3  Spreading  Activation  and  Inference 

Queries  can  be  posed  after  the  knowledge  base  has  been  encoded.  Queries  result  in  the  activation  of  the 
relevant  predicate  and  concepts  as  described  in  (Shastri  and  A^janagadde,  1993)  and  (Mani  and  Shastri, 
1993).  The  activation  propagation  loop  is  shown  in  Figure  8.  Shruti  phases  are  represented  as  “markers”  — 
integers  with  values  ranging  from  1  to  the  maximum  number  of  phases. 

The  system  runs  asynchronously  in  that  each  processor  continues  with  its  processing  irrespective  of  the 
progress  made  by  other  processors.  If  an  answer  to  the  query  is  found,  the  reasoning  episode  terminates 
immediately.  If  no  answer  is  found  after  a  certain  number  of  asynchronous  iterations,  all  processors  synchro¬ 
nize  and  iterate  synchronously.  This  synchronization  ensures  that  activation  has  had  a  chance  to  traverse 
the  depth  of  the  network  and  is  a  safeguard  against  unlikely,  but  possible,  cases  of  pathological  imbalances 
in  computation  and  interprocessor  communication  load.  If  no  imswer  is  found  even  after  a  fixed  number 
of  synchronous  propagation  steps,  the  reasoning  episode  terminates  without  an  answer.  This  termination 
criteria  is  in  keeping  with  the  constraint  that  reflexive  reasoning  can  only  occur  up  to  a  bounded  depth.  The 
user  can  experiment  with  the  terminating  criteria  by  setting  the  number  of  asynchronous  and  synchronous 
iterations  at  compile  time. 

Each  processing  node  (except  the  server),  maintains  several  activation  “frontiers”  for  both  the  rule  base 
find  the  type  hierarchy.  Each  frontier  is  essentially  a  list  of  predicates  or  concepts  that  are  active  and  which 
reed  to  be  considered  in  the  current  activation  propagation  step.  The  following  frontiers  are  maintained; 
A  rule-frontier  consisting  of  consequent  predicates  of  rules  under  consideration  in  the  current  step;  A  fact- 
frontier  consisting  of  predicates  for  which  fact  matches  need  to  be  checked;  A  back-propagation-frontier  for 
handling  back  propagation  of  collector  activation;  and  a  type-hierarchy-frontier  for  activation  propagation  in 
the  type  hierarchy.  During  each  propagation  step,  all  frontiers  are  consistently  updated  in  preparation  for  the 
next  step  in  the  iteration.  Frontier  elements  are  deleted  after  performing  the  required  operation.  A  frontier 
element  will  reappear  in  the  frontier  for  the  next  propagation  step  only  if  the  operation  attempted  in  the 
current  step  was  unsuccessful.  This  ensures  that  the  same  operation — like  firing  a  specific  rule,  matching  a 
fact  or  firing  an  is-e  fact — is  not  unnecessarily  repeated.  All  frontiers  are  created  and  deleted  asynchronously 
on  each  processor. 

During  an  episode  of  reasoning,  all  interprocessor  communication — including  firing  rules,  spreading  ac¬ 
tivation  in  the  type  hierarchy  and  back-propagating  collector  activation — is  effected  using  active  messages 
supported  by  the  CMAML  routines.  The  system  has  been  tailored  so  that  any  information  that  needs  to  be 
exchanged  between  two  processors  will  always  fit  in  a  single  active  message. 

Each  activation  propagation  step  (on  a  given  processor)  results  in  advancing  edl  activation  frontiers  by 
one  level.  In  a  given  propagation  step,  each  processor  scans  its  frontiers  and  takes  appropriate  action — 
like  firing  rules,  matching  facts,  etc.  The  active  messages  these  processors  send  out  will  invoke  handlers 
when  they  arrive  at  their  destination.  The  handler  functions  perform  the  requested  action — like  receiving 
an  instantiation,  updating  relevant  frontiers,  and  so  on.  In  the  asynchronous  phase,  each  processing  node 
operates  independently  of  the  others. 

Type  Hierarchy  and  Multiple  Instsuntiation 

The  type  hierarchy  is  handled  in  a  manner  that  is  essentially  similar  to  the  rule  base.  Spreading  bottom-up 
and  top-down  activation  is  separate  and  sequential.  As  entities  go  active,  they  broadcast  their  activations 
to  ail  the  processors  in  the  partition.  The  processors  cache  this  information  for  fast,  local  access  during  fact 
matching  and  special  condition  checking.  In  order  to  handle  multiple  instantiation  (also  see  Appendix  B), 
whenever  a  predicate  or  concept  receives  activation,  it  is  compared  with  existing  activation  in  the  banks.  If  the 
incoming  activation  is  not  already  represented,  it  is  then  deposited  into  the  next  available  bank.  The  predicate 
representing  the  instantiation  keeps  track  of  the  source  of  the  instantiation  in  order  to  back  propagate 
collector  activation.  An  instantiation  will  need  to  be  identified  using  (i)  the  processor  housing  the  predicate  or 
concept;  (ii)  the  predici-te  or  concept  that  originated  the  instantiation  and  (iii)  the  bank  under  consideration. 
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Figure  9:  Shruti*cm5  running  on  a  CM-5  with  32  proceaaors.  The  graph  shows  the  effect  of  the  size  of 
the  knowledge  base  on  response  time  for  queries  with  varying  inference  depths.  Due  to  the  random  nature 
of  the  knowledge  base  and  the  queries  used,  response  times  for  a  given  depth  are  statistically  reliable  only 
when  a  large  number  of  data  points  are  averaged.  For  the  larger  depths,  very  few  data  points  were  available 
and  this  accounts  for  the  seemingly  better  performance  at  larger  depths.  We  expect  the  “dip”  in  the  curve 
to  “straighten  out”  as  more  data  points  are  averaged. 

Enough  information  is  maintuned  when  an  instantiation  is  received  so  that  collector  activation  can  be 
propagated  back  to  the  predicate  bank  which  originated  the  activation.  Note  that  multiple  instantiation  is 
handled  without  the  use  of  switches  (Mani  and  Shastri,  1993);  the  above  protocol  is  functionally  equivalent 
to  these  switches  and  ensures  that  (i)  any  predicate  or  concept  represents  at  most  a  bounded  number  of 
instantiations  (the  number  being  decided  by  the  multiple  instantiation  constants  K1  and  K2)  and  (ii)  a  given 
instantiation  is  represented  at  most  once  so  that  no  two  banks  of  a  predicate  or  concept  represent  the  same 
instantiation. 

Statistics  Collection 

Shruti-cm5  can  be  configured  to  collect  statistics  about  various  aspects  of  the  system  like  knowledge 
base  parameters,  processor  communication  and  computation,  and  the  reasoning  process.  These  include  the 
distribution  of  knowledge  base  items  among  processors,  the  processor  load  and  message  traffic  during  query 
answering,  and  a  count  of  knowledge  base  items  of  each  type  (rules,  facts,  concepts,  etc.)  activated  during 
processing.  Full-fledged  data  collection  can  slow  down  the  system  due  to  the  extra  time  needed  to  accumulate 
required  data. 

6.4  Characteristics  of  Shruti-cm5 

Shruti-cm5  has  been  tested  using  knowledge  bases  containing  up  to  several  hundred  thousand  rules  and 
facts.  Most  of  the  experimentation  has  been  carried  out  on  a  32  node  machine. 
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Figure  10:  Shruti-cmS  running  on  a  CM-5  with  32  processors.  The  graph  shows  the  number  of  rules  fired 
in  answering  queries  with  varying  inference  depths.  See  caption  for  previous  figure  for  an  explanation  of  the 
unexpected  “dip”  in  the  curve. 
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Figure  11:  Shruti-cm5  running  on  a  CM-5  with  32  processors.  The  graph  shows  the  average  time  needed 
to  fire  a  rule,  shown  as  a  function  of  knowledge  base  size  and  query  depth. 
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Figure  12:  Shruti-cm5  running  on  a  CM-5  with  32  processors.  Distribution  of  knowledge  base  elements 
(rules,  facts  and  ts-a  relations)  on  the  CM-5  processors  for  a  knowledge  base  with  approximately  300,000 
elements. 
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Figure  13:  Shruti-cm5  running  on  a  CM-5  with  32  processors.  Computational  load  distribution  on  the 
CM-5  processors.  The  number  of  active  predicates,  entities  and  facts  on  each  processor  is  shown.  This  load 
distribution  was  obtained  when  answering  a  query  of  depth  8  with  a  knowledge  base  of  size  approximately 
300,000. 
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Figure  14;  Shruti-Cm5  running  on  a  CM-5  with  32  proceaaors.  Communication  load  distribution  on  the 
CM-5  processors.  The  number  of  active  messages  sent  by  each  processor  is  shown.  This  load  distribution 
was  obtained  when  answering  a  query  of  depth  8  with  a  knowledge  base  of  size  approximately  300,000. 

Figures  9 — 14  illustrate  the  performance,  timing  and  resource  usage  of  shruti-cm5.  Figure  9  plots 
response  time  for  varying  query  depths  and  knowledge  base  sizes.  Figure  10  shows  the  number  of  rules  fired 
when  answering  the  respective  queries.  In  both  these  figures,  the  queries  used  were  generated  randomly,  and 
the  values  shown  are  averages  for  a  given  knowledge  base  size  and  query  depth.  About  100  queries  with 
depths  ranging  from  0  to  8  were  used;  some  of  the  queries  were  answered  while  several  were  not.  The  gr^hs 
depict  the  average  for  queries  that  were  answered.  The  number  of  queries  contributing  to  each  data  point 
ranges  from  about  15  (for  depth  0)  to  1  (for  maximum  depth).  As  the  number  of  queries  averaged  over 
increases,  we  expect  the  curves  to  get  smoother  and  statisticadly  more  reliable. 

Figure  1 1  shows  the  average  time  needed  to  fire  a  rule  as  a  function  of  knowledge  base  size  and  query 
depth.  When  a  reasonably  large  number  of  rules  fire  in  a  given  reasoning  episode,  the  time  needed  per  rule 
firing  settles  to  a  small,  relatively  constant  value.  Due  to  random  queries  being  posed  to  a  random  knowledge 
base,  there  is  lot  of  variation  in  the  response  time  and  other  performance  statistics  for  a  given  knowledge 
base  size  and  query  depth.  Among  all  this  variation,  the  behavior  of  the  “time-per-rule”  metric  seems  to  be 
consistent  over  a  variety  of  knowledge  bases.  We  however  do  not  know  whether  the  “time-per-rule”  metric 
will  remain  constant  if  the  knowledge  bases  are  significantly  larger  than  the  ones  we  have  experimented  with. 

Figure  12  shows  the  distribution  of  a  knowledge  base  with  approximately  300,000  elements  among  the 
CM-5  processors.  It  is  easily  seen  that  the  distribution  is  very  even  as  a  result  of  random  processor  allocation. 
Finally,  Figures  13  and  14  show  the  computation  and  communication  load  on  each  processor  for  a  300,000 
element  knowledge  base  and  a  query  of  depth  8.  Computation  load  is  measured  as  the  number  of  active 
predicates,  entities  and  facts  on  each  processor,  while  communication  load  is  the  number  of  active  messages 
sent  out  by  each  processor.  In  spite  of  the  unpredictable  nature  of  the  activation  trail  in  the  knowledge  base, 
communication  and  computation  load  are  relatively  well  balanced.  Processor  load  is  reasonably  balanced 
irrespective  of  the  query. 

The  timing  reported  in  the  graphs  is  the  elapsed  time  needed  to  process  the  queries.  Random,  struc¬ 
tured  knowledge  bases  were  used  in  these  tests  (see  Section  6.5).  These  knowledge  bases  exploited  the  full 
functionality  of  the  reasoning  system  ^uld  had  a  mix  of  regular  rules  and  facts,  rules  with  special  conditions, 
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quantified  facts  and  ts-a  relations.  Rules  with  special  conditions  included  rules  with  repeated  variables,  typed 
variables,  existential  variables  and  entities;  rules  with  multiple  predicates  in  the  antecedent  and  rules  which 
lead  to  multiple  instantiation  of  predicates.  In  spite  of  the  large  scale  of  these  experiments,  it  is  evident  that 
shruti-cmS  provides  relatively  good  performance.  Figure  3  compares  the  performance  of  shruti-cm5  and 
SHRUTI-CM2. 

6.5  Generating  Knowledge  Bases 

Almost  all  experimentation  with  shruti-cm5  have  been  earned  out  using  randomly  generated  structured 
knowledge  bases.  Though  the  individual  knowledge  base  elements  are  generated  at  random,  these  elements 
are  orgtmized  into  domains  thereby  imposing  structure  on  the  knowledge  base.  Each  domain  is  a  cluster  of 
predicates  along  with  their  associated  rules  and  facta.  Domains  could  be  of  two  types:  target  domains,  which 
correspond  to  “expert”  knowledge  about  various  real-world  domains;  and  special  domains,  which  represent 
basic  cognitive  and  perceptual  knowledge  about  the  world.  A  typical  structured  knowledge  biise  would 
consist  of  several  target  domains  and  a  small  number  of  special  domains.  The  predicates  within  each  (target 
or  special)  domain,  and  predicates  across  target  and  special  domains  richly  connected  by  rules;  predicates 
across  different  target  domains  are  sparsely  connected.  The  structure  imposed  on  the  knowledge  base  is 
a  gross  attempt  to  mimic  a  plausible  structuring  of  real-world  knowledge  bases.  This  is  motivated  by  the 
notion  that  knowledge  about  complex  domains  are  learned  and  grounded  in  metaphorical  mappings  from 
certain  basic,  perceptually  grounded  domains  (Lakoff  and  Johnson,  1980).  However,  the  “knowledge”  in 
each  domain  is  currently  being  generated  at  random. 

The  knowledge  base  generator  takes  seversd  parameters  as  input.  These  parameters  decide  the  number 
of  predicates,  entities,  rules  and  facts  that  will  be  generated,  the  fractions  of  various  special  rules,  facts  and 
ts-a  relations,  the  number  of  domains,  the  distribution  of  the  knowledge  base  among  the  domains  and  the 
fraction  of  inter-  and  intra-domain  rules.  The  number  and  maximum  depth  of  the  type  hierarchies  generated 
can  also  be  controlled. 

The  parameters  supplied  to  generate  the  knowledge  base  used  for  the  CM-5  experiments  (identified  in 
the  graphs  as  kb3)  is  shown  below; 


- - -  Knowledge  Base  Parameters  - - - 

Humber  ol  rules;  160000 

Humber  of  facte:  160000 

Humber  of  predicates:  60000 

Humber  of  concepts:  60000 

Multiple  antecedent  rule  fraction:  0.10 

Multiple  instantiation  rule  fraction:  0.10 

Special  rule  fraction;  0.40 

Fraction  of  is-a  facts;  0.26 

Fraction  of  facts  with  E  vars:  0.10 

-  Domain  Parameters  - 

Humber  of  special  domains:  3 

Humber  of  target  domains:  160 

Spl-Tgt  knowledge  base  split;  0.02 

Fraction  of  intra-spocial-domain  rules:  1.00 

Fraction  of  inter-special -domain  rules:  0.00 

Fraction  of  intra-target-domain  rules:  0.96 

Fraction  of  inter-target-domain  rules:  0.01 

Humber  of  type  hierarchies:  10 

Meucimum  depth  of  type  hierarchies:  6 


Fraction  of  shared  leaves  in  type  hiers.:  0.06 
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6.6  Proposed  Experiments  with  Real- World  Knowledge  Bases 

Recently  we  have  obtained  WordNet  (Miller  et  al.,  1990)  and  plan  to  map  it  to  our  system.  Although 
WordNet  does  not  exercise  the  full  expressive  and  inferential  power  of  our  system,  it  is  a  sufficiently  large 
knowledge  structure  with  numerous  applications  and  can  be  used  to  test  the  effectiveness  of  certun  aspects 
of  our  system  design,  especially  those  having  to  do  with  message  passing.  We  have  also  obtained  a  large 
knowledge  base  consisting  of  over  14,000  frames  and  170,000  attribute-value  pairs  about  plant  anatomy  and 
physiology  from  Bruce  Porter  of  the  University  of  Texas  at  Austin  (Porter  et  al.,  1988).  The  mapping  of  this 
knowledge  base  to  our  system  is  very  similar  to  that  of  WordNet.  We  are  also  trying  to  aicquire  a  subset  of 
the  CYC  knowledge  base  (Lenat  et  al.,  1990)  in  the  near  future. 

A  planned  application  of  our  knowledge  base  system  is  to  couple  it  to  the  Berkeley  Restaurant  Project 
(BeRP)  speech  understanding  system  being  developed  at  teh  International  Computer  Science  Institute  (Ju- 
rafsky  et  al.,  1994a;  Jurafsky  et  al.,  1994b).  BeRP  functions  as  a  knowledge  consultant  whose  domain  is 
restaurants  in  the  city  of  Berkeley,  California.  Users  ask  spoken  Itmguage  questions  of  BeRP  which  then 
queries  a  database  of  restaurants  and  gives  advice  based  on  cost,  type  of  food,  and  location.  The  current 
BeRP  system  cannot  perform  inferences  and  any  possible  inferences  are  either  hard  wired  into  the  grammar 
or  added  to  the  restaurant  database.  Our  knowledge  base  system  will  allow  BeRP  to  make  inheritance-like 
inferences  (a  Chinese  restaurant  is  an  Asian  restaurant)  as  well  as  more  complex  inference  (if  the  user  has 
a  car  they  can  get  to  more  distant  restaurants).  The  rapid  response  of  our  knowledge  base  system  will  be 
particularly  useful  for  an  on-line  speech  understanding  system  like  BeRP. 

6.7  The  Shruti-cm5  User  Interface 

The  following  example  illustrates  the  existing  user  interface  to  shruti-cmS  and  supporting  utilities. 

1.  Knowledge  base  generation.  The  user  must  begin  with  a  knowledge  base  in  a  syntax  recognised 
by  shruti-cmS.  Knowledge  bases  in  other  formats  should  be  translated  into  a  from  accepted  by  the 
system.  The  following  is  an  example  knowledge  base  in  SHRUTI-cmS  syntax. 

/*  Rules  */ 

Forall  x,y,z  [  give(x,y,z)  =>  own(y.z)  3; 

Forall  x,y  C  ovnCx.y)  =>  can.selKx.y)  ]; 

Forall  xrAniaal,  y.'Aniaal 

C  prey8_on(x,y)  =>  scared.of (y,x)  ]; 

Forall  x.y,z  Exists  t 

C  nove(x,y,z)  =>  preseat(x,z,t)  ]; 

Forall  x,y,z  C  nove(x,y.z)  =>  present (x,y,t)  ]; 

Forall  x.y  [  sibling(x,y)  *  bom_togetber(x,y}  ->  tsins(x,y)  ]; 

Forall  x.y  C  siblingfx.y)  =>  sibling(y,x)  3; 

/*  Facts  */ 

give  ( John, Nary, Bookl ) ; 
move  ( John . lyc , Boston) : 
sibling  (John.x); 

Forall  x:Cat,  y:Bird  [  preys_on(x,y)  3; 

Exists  x: Robin  C  osn(Kary,x)  3; 

/♦  Type  hierarchy  */ 
is-a  (Bird, Animal ) ; 
is-a  (Cat, Animal) ; 
is-a  (Canary.Bird) ; 
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is-a  ( Tvaatj, Canary ) ; 
is-a  (Sylvaatar ,Cat) . 


It  is  also  possible  to  create  a  (pseudo-random)  knowledge  base  using  the  knowledge  base  generator 
(Section  6.5)  The  output  of  the  generator  is  in  the  above  syntax. 

2.  Preprocessing  and  loading.  The  preprocessor  reads  the  input  knowledge  base,  assigns  knowledge 
btise  items  to  CM-5  processors  (using  one  of  several  available  processor  assignment  schemes)  and  writes 
out  a  set  of  hies.  These  files  are  read  and  encoded  on  the  CM-5. 

3.  Parallel  knowledge  processing.  Once  the  KB  has  been  loaded  on  the  CM-5  one  can  pose  queries, 
obtain  answers,  and  gather  performance  and  timing  data.  The  following  dialog  illustrates  how  the  user 
interacts  with  the  system.  The  system  prompt  is  ».  User  input  is  in  typsnritsr  font  while  system 
output  is  shown  in  slanted  font. 

»  i  input-kb.pp 

Processing  file  input-kb.pp  ....  done 
»  n  -g 

»  i 

Enter  Rules/Facts  or  Query: 

can  ji  all (Mary , Book 1) ? 

»  r 

Simulating ...  done 

Query  /utswered  affirmatively  in  0.001638  seconds 

»  z 

Resetting  network  ...  done. 

»  i  query 

Processing  file  query  ....  done 

»  r 

Simulating ...  done 

» 

The  input  command  i  is  used  to  input  the  knowledge  base  and  to  pose  queries.  The  run  com¬ 
mand  r  runs  a  reasoning  episode.  It  reports  elapsed  time  if  the  query  is  answered  (as  in  the  case 
of  can.jB«ll(Nary  ,Bookl}?).  If  the  query  is  not  answered,  no  timing  is  displayed  (as  in  the  case  of  the 
query  contained  in  the  file  query).  Further  commands  can  be  used  to  view  knowledge  base  distribution 
on  the  processors,  processor  load,  individual  processor  timing,  number  of  rules  fired,  active  predicates 
and  concepts,  number  of  messages  sent,  and  so  on  (see  Appendix  E). 

The  system  also  provides  the  capability  to  process  command  files  in  order  to  facilitate  unattended 
batch  processing. 

4.  Analysis  and  visualization.  The  data  obtained  from  reasoning  episodes  can  be  analyzed  and  plotted 
as  graphs  (Figures  9-11);  dynamic  processor  load,  timing,  etc.  can  be  visualized  (Figures  13  and  14); 
knowledge  base  distribution  can  be  analyzed  and  visualized  (Figure  12);  and  the  actual  connectivity 
of  the  knowledge  base  can  be  graphically  displayed.  All  analysis  and  visualization  are  done  off-line. 

Integrated  User  Environment 

In  the  existing  shruti-cm5  system,  all  tools  and  utilities  are  separate  programs.  The  user  must  manually 
invoke  the  required  program  or  script  in  order  to  execute  auiy  kind  of  processing,  analysis  or  visualization. 
Future  versions  of  SHRUTI-CM5  will  provide  an  integrated  graphical  user  environment  which  integrates  a 
suite  of  programs  and  tools; 
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•  Random  knowledge  base  generators  and  programs  for  translating  knowledge  bases  in  (a  limited  number 
of)  other  formats  into  a  form  accepted  by  shruti-cm5; 

•  A  parser  for  reading  knowledge  expressed  in  a  human  readable,  first-order-logic-like  language; 

•  A  preprocessor  for  transforming  knowledge  bases  into  a  form  suitable  for  efficient  loading  onto  the 
underlying  parallel  machine; 

•  An  efficient,  rapid  reasoning  system  running  on  the  underlying  parallel  machine  which  can  retrieve 
answers  to  queries  in  real-time. 

•  Statistics  collection  procedures  which  can  accumulate  data  regarding  the  knowledge  base  and  various 
aspects  of  parallel  knowledge  processing. 

•  A  variety  of  tools  for  analyzing  and  visualizing  the  knowledge  base  and  data  produced  by  the  statistics 
modules. 

The  parallel  rapid  reasoning  system  would  form  the  core  of  the  shruti-cmS  system  around  which  all  the 
other  programs  and  tools  would  be  organized.  Data  processing,  analysis  and  visualization  tools  would  be  a 
combination  of  scripts,  already  existing  tools  and  custom  generated  programs.  Except  for  the  parallel  part, 
all  the  other  tools  would  be  off-line,  usable  on  a  workstation  and  integrated  into  an  interactive,  easy-to-use 
graphical  interface. 

The  shruti-cmS  system  would  also  provide  for  automated  remote  access  to  the  CM-5  so  that  ail  off-line 
tools  and  processing  can  be  confined  to  the  local  workstation.  The  parallel  reasoning  episodes  will  be  run  on 
the  remote  CM-5  and  the  results  and  output  transferred  back  to  the  local  workstation  for  further  processing. 


7  Related  Work 

There  has  been  considerable  work  in  the  conceptual  design  of  massively  parallel  systems  based  on  spreading 
activation,  marker  passing,  and  connectionism  (Lange  and  Dyer,  1989;  Sun,  1991;  Barnden  and  Srinivaa, 
1991;  Waltz  and  Pollack,  1985;  Charniak,  1983;  Fahlman,  1979).  However,  only  very  few  researchers  have 
tried  to  implement  knowledge  base  systems  on  existing  parallel  platforms.  A  s^ient  example  of  such  work 
is  the  PARKA  system  (Evett  et  al.,  1993)  implemented  on  the  CM-2.  Parka  encodes  frame-based  knowl¬ 
edge  (analogous  to  a  semantic  network)  and  supports  efficient  computation  of  inheritance,  recognition,  and 
structure  retrieval  which  is  a  generalization  of  recognition.  The  performance  of  PARKA  has  been  tested  using 
pseudo-random  networks  (with  up  to  130,000  nodes)  as  well  as  subsets  of  CYC  (Evett  et  al.,  1993;  Lenat 
et  al.,  1990).  The  CYC  subsets  used  had  about  26,000  units.  Parka’s  run  time  for  inheritance  queries  is 
0{d)  and  for  recognition  queries  is  0{d  +  p)  where  d  is  the  depth  of  the  is-o  hierarchy  and  p  is  the  number 
of  property  constraints.  Actual  run-times  range  from  a  fraction  of  a  second  (for  inheritance  queries)  to  a 
little  more  than  a  second  (for  recognition  queries  with  15-20  coqjuncts).  Parka  does  not  support  rule- 
based  reasoning;  it  can  only  handle  frame-based  knowledge  with  some  extensions  to  deal  with  memory-based 
reasoning. 


Semantic  Networks  on  Special  Purpose  Hardware 

Fahlman  (1979)  had  proposed  the  design  of  netl,  a  massively  parallel  machine  that  could  execute  marker 
passing  algorithms  for  computing  inheritance  and  recognition  in  parallel.  Although  this  machine  was  never 
built,  it  influenced  the  design  of  the  CM-2  (Hillis,  1985).  Researchers  such  as  Moldavan  (1993)  have  also 
proposed  and  built  special  purpose  hardware  for  realizing  semantic  networks  and  production  systems. 

The  partitioning  and  mapping  of  production  systems  (or  rule-based  systems)  onto  multiprocessors  is 
considered  in  (Moldovan,  1989).  A  performance  index  is  obtained  by  analyzing  rule  interdependencies. 
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This  performance  index  is  optimized  so  as  to  maximize  inherent  parallelism  and  minimize  interprocessor 
communication.  Optimizing  the  performance  index  is  intractable  and  approximations  and  simplifications  are 
necessary  in  order  to  make  the  problem  tractable.  A  message-passing  multiprocessor  architecture  (RUBIC, 
for  Rule-Based  Inference  Computer)  for  parallel  execution  of  production  systems  is  also  described. 

The  Semantic  Network  Array  Processor  (SNAP)  developed  at  the  University  of  Southern  California  is 
described  in  (Moldovan  et  al.,  1992).  The  conceptual  design  of  the  SNAP  is  based  on  associative  memory 
and  marker  passing,  and  is  optimized  for  representing  and  reasoning  with  semantic  networks.  The  SNAP 
provides  a  special  instruction  set  for  network  creation  and  maintenance,  marker  creation  and  propagation, 
logic  operations  and  search/retrieval.  A  SNAP  prototype  has  been  built  with  off-the-shelf  components 
and  used  to  implement  a  parallel,  memory- based  parser  (Moldovan  et  al.,  1992).  The  parser  is  capable  of 
processing  sentences  in  1-10  seconds  depending  on  the  sentence  length  and  the  size  of  the  knowledge  base 
used.  The  largest  knowledge  base  used  consisted  of  about  2,000  nodes. 

Unlike  shruti  and  parka,  SNAP-based  knowledge  representation  systems  use  special  purpose  hardware. 
Further,  SNAP-based  systems  can  only  deal  with  semantic  networks  and  do  not  currently  support  the  full 
range  of  inferences  supported  by  SHRUTI. 


8  Conclusion 

We  have  described  an  SPMD  mapping  of  shruti  on  the  Connection  Machine  CM-5.  We  have  discussed  issues 
involved  in  the  design  and  implementation  of  this  system — both  from  machine  independent  and  machine 
dependent  points  of  view.  From  the  test  results  summarized  in  the  previous  sections,  it  is  evident  that 
SPMD  implementations  are  vastly  superior  in  comparison  with  the  SIMD  system  and  offer  speedups  of 
several  hundred.  In  view  of  its  greatly  improved  performance,  we  plan  to  expend  our  effort  in  improving 
and  extending  the  asynchronous  (SPMD)  message  passing  system  on  the  CM-5.  The  SPMD  rapid  reasoning 
system  on  the  CM-5  is  also  being  mathematically  analyzed  (Mani,  1994)  with  the  objective  of  obtaining 
quantitative  measures  which  can  be  used  to  further  improve  performance. 

Shruti-cm5*  currently  supports  only  backward  reasoning.  Future  work  on  the  CM-5  will  involve  devel¬ 
oping  a  forward  reasoning  system  and  an  integration  of  the  forward  and  backward  reasoners. 

All  experiments  reported  here  have  used  randomly  generated  knowledge  bases.  As  noted  in  Section  6.6, 
we  plan  to  encode  large  real-world  knowledge  bases  on  the  system  and  interface  it  with  applications.  This 
will  not  only  help  us  evaluate  the  parallel  rapid  reasoning  systems  more  thoroughly,  but  will  also  result  in 
practical  and  usable  systems.  Depending  on  the  kind  of  knowledge  bases  used,  we  also  expect  this  endeavor 
to  provide  insights  into  aspects  of  reflexive  reasoning. 
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A  Shruti  on  the  CM-2 


The  CM-2  (TMC,  1991a)  is  an  SIMD  data  parallel  computing  machine  which  can  be  configured  with  up  to 
64K  processing  elements.  Each  processor  has  several  kilobits  of  local  memory  and  can  execute  arithmetic 
and  logical  instructions,  calculate  memory  addresses,  read  and  store  information  in  memory  and  perform 
interprocessor  communication.  The  processors  are  organized  as  an  n-dimensional  hypercube.  The  CM-2  is 
controlled  by  a  standard  serial  front  end  processor  (usually  a  vax  or  SUN  machine).  A  sequencer  decodes 
commands  from  the  front  end  and  broadcasts  them  to  the  data  processors,  all  of  which  then  execute  the  same 
instruction  simultaneously  and  synchronously.  A  NEWS  grid  provides  fast  communication  between  adjacent 
processors  and  a  router  network  provides  general  interprocessor  communication  between  any  two  processors. 

The  design  and  implementation  of  the  SIMD  parallel  rapid  reasoning  system  on  the  CM-2 — SHRUTI- 
cm2 — is  based  on  knowledge-level  partitioning  (Section  4)  of  the  underlying  network  generated  by  a  knowl¬ 
edge  base.  We  describe  techniques  used  to  encode  the  knowledge  base  and  implement  spreading  activation 
when  answering  queries.  We  then  explore  the  characteristics  of  the  system  by  running  a  battery  of  tests.  All 
discussion  pertains  only  to  backward  reasoning. 

A.l  Encoding  the  Knowledge  Base 

The  knowledge  base  is  encoded  by  presenting  rules  and  facts  (including  ts-a  facts)  to  the  SHRUTI-CM2  system. 
The  input  syntax  for  rules,  facts,  ts-a  relations  and  queries  is  specified  in  Appendix  D.  Appendix  E  gives  a 
listing  of  commands  recognized  by  shruti-cm2. 

Input  Processing 

A  lexical  analyzer  and  parser  read  the  input,  parse  it  and  build  internal  data  structures  which  represent  the 
rules  and/or  facts  presented  to  the  system.  All  input  processing  is  performed  sequentially  on  the  front-end. 

As  predicates  and  entities  (or  concepts)  are  recognized  in  the  input,  the  parser  builds  hash  tables  which 
keep  track  of  processor  assignments.  The  hash  tables  can  be  used  to  efficiently  access  these  predicates  and 
entities  while  encoding  rules  and  facts,  posing  queries  and  inspecting  their  state. 

Once  a  rule  or  fact  (including  an  ts-a  relation)  has  been  recognized  and  processed,  the  resulting  internal 
data  structures  can  be  used  to  encode  the  rule  or  fact  on  the  Connection  Machine  processors.  In  the  case  of 
a  query,  the  data  structures  will  be  used  to  pose  the  query  to  the  system. 

Representing  Knowledge  Base  Elements 

Knowledge  base  elements  are  represented  on  the  processors  using  parallel  structures.  A  parallel  structure 
allocates  space  for  the  specified  structure  on  every  processor.  Figures  15  and  16  indicate  the  structures 
used  to  encode  predicates,  rules  and  facts  in  the  rule-base.  The  structures  used  to  encode  concepts  and 
is-a  relationships  in  the  type  hierarchy  are  similar  (though  simpler).  Note  that  a  parallel  structure  will  be 
allocated  for  each  knowledge  base  element;  predicate,  fact,  rule,  concept  and  ts-a  link.  When  the  knowledge 
base  grows  and  more  space  is  needed,  the  size  of  the  parallel  structure  is  doubled.  The  virtual  processor 
capability  of  the  CM-2  ensures  that  each  (physical)  processor  now  houses  two  structures.  This  is  transparent 
to  the  programmer  and  one  can  still  assume  that  each  processor  houses  one  structure,  with  double  the  number 
of  (virtual)  processors  in  the  machine.  Using  this  scheme,  the  representation  automatically  scales  with  the 
size  of  the  knowledge  base.  As  the  number  of  virtual  processors  increases,  the  system  will  run  proportionately 
slower.  The  virtual  processor  mechanism  therefore  provides  a  simple,  scalable  and  transparent  way  of  trading 
off  time  for  space. 
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/*  prsdicat*  on  th«  CM  «/ 


typodot  struct  CB_pr«d 
{ 


bool 

used; 

/•  flag  •/ 

byte 

noOZArgs ; 

byte 

nextFree ; 

/*  index  of  next  free  bank  (ainst)  */ 

struct  ca_predbank 

baiikCK2]: 

/*  predicate  banks  */ 

Cl!_Pred; 

typedef 

/ 

struct  ca_predbank 

/*  predicate  bank  on  the  CM  */ 

V 

/•  no 

fields  used  to  encode  KB 

•/ 

bool 

cChange; 

/*  collector  value  changed  */ 

bool 

eChange; 

/*  enabler  value  changed  */ 

byte 

collector; 

byte 

enabler; 

byte 

argsCNAX.ARGS]; 

/*  arg  activation  phase  */ 

}  CN^PredBank; 

Figure  IS:  Structures  used  to  represent  predicates  in  shruti-cm2.  NAZ-AROS  is  the  maximum  number  of 
arguments  a  predicate  can  have.  K2  is  the  multiple  instantiation  constant  for  predicates.  Flags  have  type 
bool.  The  top  part  of  the  typsdsts  contain  fields  used  to  encode  the  knowledge  base  while  the  bottom  part 
has  fields  used  in  a  given  episode  of  reasoning. 


Encoding  Rules  and  Facts 

Depending  on  the  processor  allocation  scheme  used  (Section  4),  every  predicate  and  entity  appearing  in 
the  knowledge  base  will  be  assigned  to  a  (virtual)  processing  element  on  the  CM-2.  Further,  a  rule  or 
fact  (including  an  is-a  relation)  that  is  being  encoded  will  also  be  assigned  to  a  (virtual)  processor.  These 
two  processor  allocations — one  for  the  relevant  predicates/entities  amd  the  other  for  the  rule/fact  under 
consideration — may  or  may  not  be  independent.  The  actual  details  of  the  processor  allocation  are  dictated 
by  the  processor  assignment  scheme  being  used. 

The  current  and  more  recent  versions  of  SHRUTI-cm2  use  random  processor  assignment  schemes  for  all 
knowledge  base  elements — predicates,  concepts,  facts,  rules  and  it~a  links.  Earlier  versions  used  random 
aJlocation  for  predicates  and  concepts;  however,  facts  and  ts>a  links  were  encoded  on  the  processors  contain¬ 
ing  the  relevant  predicate  or  concept  and  rules  were  encoded  on  the  processor  containing  the  consequent 
predicate. 

Once  the  predicates,  concepts  and  other  knowledge  base  elements  under  consideration  are  assigned  to 
processing  elements  on  the  CM-2,  all  that  remains  to  be  done  in  order  to  encode  the  rule/fact  is  to  correctly 
fill  out  the  various  fields  in  the  relevant  structures.  Encoding  a  fact  involves  the  corresponding  predicate  and 
the  entities  filling  the  arguments  of  the  predicate.  Encoding  a  rule  (is-a  relation)  involves  two  predicates 
(concepts)  and  a  rule-slot  (is-a  link).  If  a  rule  has  multiple  predicates  in  the  antecedent,  the  encoding  is 
slightly  more  complex,  as  pictured  in  Figure  17. 
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/«  rul*  slot  on  tho  CM  */ 


typodof  atrnct  ca_rnl« 

{ 

/*  knovlodgo  baio  oncodlng  «/ 
bool  mod;  /«  flag  */ 

bool  dimy;  /*  rulo  slot  is  duasy  if  flag  sot  */ 

indsx  antacsdsnt;  /*  invalid  for  hsad  mis  slots  */ 

indsz  conssqusnt;  /*  points  to  hsad  slot  in  a  dnaay  */ 

byta  noOflnts;  /*  >  1  in  a  hsad  mis  slot  */ 

int  ssight : 

byts  antloOfArgs;  /•  invalid  for  hsad  mis  slots  *t 

byts  argNapCNAX.ARGS] ;  /*  arg  napping;  invalid  on  hsad  slot  */ 

byts  splCond [MAX. ARCS]  ;  /*  not  ussd  in  duny  slots  */ 

int  splindsz  CNAX.ARGS]  :  /*  not  used  in  dunay  slots  */ 

/*  rsasoning  spisods  «/ 

byts  dnaayCollsctorCKS]  ;  /*  nssd  only  in  diuniy  slots  •/ 

bool  firs;  /*  mis  can  firs  if  sst  «/ 

bool  sslsctsd;  /«  instantiation  sslsctsd  if  sst  «/ 

byts  nsxtBank;  /*  nsxt  conssq  prsd  bank  to  considsr  */ 

byts  bankSslsetsdOCS] ;  /*  mis  back  pointsr  */ 

i*  IQTB:  bankSslsetsdCi]  j  n  bank  i  in  ths  ant  prsd  has 
instantiation  froa  bank  j  in  ths  conssq  prsd;  valid  only  on 
non-hsad  mis  slots;  in  a  hsad  mis  slot  bankSslsetsdCi]  ■■  i  */ 

}  CN.Ruls; 


typsdsf  stmet  ca_fact  /*  fact  on  ths  CM  */ 

i 

bool  nssd;  /*  flag  */ 

indsx  faetPrsd;  /*  fact  prsdicats  indsx  */ 

byts  noOf Args ; 

indsx  constant CNAX.ARGS] ;  /*  fact  argnasnts  */ 

bool  active;  /*  fact  active  if  sst  */ 

>  CM.Fact; 

Figure  16;  Structures  used  to  encode  rules  and  facts  in  shruti-cm2.  NAX-ARGS  is  the  maximum  number  of 
arguments  a  predicate  can  have.  K2  is  the  multiple  instantiation  constant  for  predicates.  Flags  have  type 
bool  while  processor  indices  have  type  indsx.  The  top  part  of  the  typsdsfs  contain  fields  used  to  encode 
the  knowledge  base  while  the  bottom  part  has  fields  used  in  a  given  episode  of  reasoning. 
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p 


Figure  17;  Encoding  single-  and  multiple-antecedent  rules.  The  figure  on  the  left  indicates  the  encoding 
of  single-antecedent  rules  while  the  figure  on  the  right  depicts  the  encoding  of  multiple  antecedent  rules. 
Every  predicate  and  rule-slot  is  housed  on  a  processor.  Arrows  indicate  links  which  are  implemented  using 
interprocessor  communication. 

A.2  Spreading  Activation  and  Inference 

Queries  can  be  posed  after  the  knowledge  base  has  been  encoded.  Again,  queries  have  a  specific  syntax  (as 
described  in  Appendix  O)  and  result  in  activating  the  relevant  predicate  and  concepts  in  keeping  with  the 
description  in  (Shastri  and  Ajjanagadde,  1993)  and  (Mani  and  Sbastri,  1993).  The  reasoning  episode  can 
then  be  run,  either  step-wise  or  to  completion.  We  now  describe  the  mechanics  of  spreading  activation  and 
matching  facts  in  the  system.  The  gross  structure  of  the  activation  propagation  loop  is  indicated  in  Figure  8. 
Phases  in  SHRUTi  are  represented  as  “markers” — integers  with  values  ranging  from  1  to  the  maximum  number 
of  phases. 

The  Rule  Base 

As  shown  in  Figure  8  spreading  activation  in  the  rule  base  consists  of  three  steps; 

•  Propagating  rule  activation.  Spreading  activation  in  the  rule  base  by  rule  firing  is  achieved  by  executing 
the  following; 

1.  Every  non-dummy  rule-slot  gets  the  instantiation  in  the  consequent  predicate  bank  under  con¬ 
sideration. 

2.  All  non-dummy  rule-slots  check  if  all  special  conditions  in  the  rule  are  satisfied. 

3.  If  ail  special  conditions  are  satisfied,  the  dummy  rule-slots  gat  the  respective  instantiations  from 
the  corresponding  head  rule-slot. 

4.  All  non-head  rule-slots  transform  the  activation  sind  sand  it  to  the  respective  antecedent  predi¬ 
cates. 

In  the  process  of  firing  a  rule,  the  system  maintains  sufficient  book-keeping  information  to  back- 
propagate  collector  activation  to  the  consequent  of  a  rule. 
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Once  a  rule  fires,  it  will  not  fire  again  unless  a  new  bank  of  the  consequent  predicate  becomes  active. 
This  ensures  that  the  same  rule  does  not  repeatedly  fire  thereby  minimizing  unnecessary  interprocessor 
communication.  Note  also  that  the  processor  housing  the  rule-slot  will  need  to  communicate  with  other 
processors  in  order  to  get  predicate  bank  instantiations,  get  information  from  the  head  rule-slot,  send 
information  to  dummy  rule-slots  and  send  the  transformed  activation  to  the  antecedent  predicate. 

e  Checking  fact  matches  for  active  predicates.  All  facts  for  predicates  which  have  active  collectors  are 
matched  simultaneously.  Processors  encoding  the  facts  communicate  with  the  processors  housing  the 
relevant  predicates  and  concepts  in  order  to  check  if  the  firing  “phases”  match.  If  a  fact  “fires”,  the 
collector  of  the  corresponding  predicate  is  activated. 

•  Back-propagating  collector  acUvatton.  Sending  collector  activation  to  predicate  banks  which  originated 
the  activation  involves  the  following: 

1.  Non-head  rule-slots  gst  the  state  of  the  predicate  collector. 

2.  Dummy  rule-slots  send  the  collector  value  to  the  head  rule  slot  which  accumulates  all  the  incoming 
values. 

3.  Non-dummy  rule-slots  send  the  activation  to  the  respective  consequent  predicates  provided  the 
collector  activation  exceeds  a  threshold.  The  threshold  could  depend  on  the  number  of  antecedent 
predicates  for  the  rule,  the  level  of  activation  of  antecedent  predicate(8),  and/or  other  factors. 

Rule-slots  that  have  already  propagated  collector  activation  to  the  corresponding  predicate  bank  will 
not  participate  in  this  step.  Again,  this  is  done  in  order  to  minimize  unnecessary  interprocessor 
communication. 

The  Type  Hierarchy 

Propagating  activation  in  the  type  hierarchy  is  similar  to  spreading  activation  in  the  rule-base,  except 
that  it  is  much  simpler.  Spreading  bottom-up  activation  and  top-down  activation  are  handled  separately 
(and  sequentially)  in  the  type  hierarchy.  When  spreading  bottom-up  (top-down)  activation,  all  is-o  links 
which  have  an  active  bank  in  the  subconcept  (superconcept)  “fire”  and  spread  activation  to  the  respective 
superconcept  (subconcept).  The  ts-a  link  gats  activation  from  the  subconcept  (superconcept)  and  sands 
it  to  the  superconcept  (subconcept).  Again,  in  order  to  minimize  communication,  we  ensure  that  any  new 
activation  traverses  corresponding  is-a  links  exactly  once. 

Multiple  Instantiation 

Multiple  instantiation  in  shruti-cm2  is  handled  without  the  use  of  switches  (Mani  and  Shastri,  1993). 
Predicates  and  concepts  can  accommodate  K2  and  K1  instances  respectively.  When  spreading  activation  in 
the  network,  predicate  and  concept  banks  are  considered  one  at  a  time.  In  other  words,  in  a  given  clock  cycle 
(i.e.,  in  one  iteration  of  the  propagation  loop;  see  Figure  8)  only  one  active  bank  of  a  predicate  or  concept 
will  be  considered.  As  described  in  Appendix  B,  care  is  taken  to  avoid  potential  problems  that  could  result 
from  this  technique. 

Whenever  a  predicate  or  concept  receives  activation,  it  is  compared  with  existing  rxtivation  in  the  banks. 
If  the  incoming  activation  is  not  already  represented,  it  is  then  deposited  into  the  next  available  bank.  The 
rule-  or  link-slot  that  sent  in  the  activation  is  notified  that  the  instantiation  it  sent  has  been  selected.  In  the 
rule  base,  the  rule-slot  receives  the  bank  number  accommodating  the  new  instantiation.  This  information 
is  needed  when  back-propagating  collector  activation.  If  the  incoming  activation  is  already  represented  in 
the  predicate  or  concept,  or  if  all  banks  are  already  in  use,  the  incoming  activation  is  discarded.  Even  in 
this  case,  rule-slots  are  notified  so  that  they  can  proceed  to  the  next  bank  of  the  consequent  predicate.  A 
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Figure  18:  Shruti-cm2  running  on  a  CM-2  with  4K  processors.  The  graph  shows  the  effect  of  the  sue  of  the 
knowledge  base  on  response  time  for  queries  which  require  inference  depths  ranging  from  0  to  10  Queries 
used  were  not  randomly  generated.  The  knowledge  base  used  was  not  structured. 


rule-slot  retries  sending  the  same  instantiation  if  it  does  not  receive  notification  that  the  activation  was  either 
selected  or  discarded. 

The  above  protocol  simulates  the  function  of  the  multiple  instantiation  switches,  and  brings  about  efficient 
dynamic  allocation  of  predicate  and  concept  banks  to  ensure  that: 

•  Any  predicate  (concept)  in  the  system  represents  at  most  K2  (Kl)  instantiations. 

•  A  given  instantiation  is  represented  at  most  once;  in  other  words,  no  two  banks  represent  the  same 
instantiation. 

Statistics  Collection 

Apart  from  timing  the  reasoning  episodes,  shruti-cm2  can  also  be  configured  to  gather  data  about  several 
other  aspects  including  knowledge  base  parameters  (number  of  rules,  facts,  is-a  relationships,  and  concepts) 
and  communication  data  (number  of  messages,  sends  and  gets).  Enabling  full-fledged  data  collection  can 
slow  down  the  system  due  to  the  extra  time  needed  to  accumulate  the  required  data. 

A.3  Characteristics  of  Shruti-cm2 

Shruti-cm2  has  been  run  on  a  4K  CM-2  and  on  a  32K  CM-2.  Both  machines  had  256  kilobits  of  memory 
on  each  processor.  Figures  18  and  19  summarize  the  results  of  experiments  run  on  these  machines.  In  these 
figures,  the  response  time  shown  is  the  actutd  CM  time  used.  The  timing  routines  available  on  the  CM-2 
also  report  elapsed  time  for  the  reasoning  episode.  Elapsed  time  is  affected  by  other  processes  running  on 
the  front  end  and  is  therefore  unreliable.  The  knowledge  bases  used  in  these  experiments  were  generated 
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Figure  19:  Shruti>cm2  running  on  a  CM'2  with  32K  processors.  The  graph  shows  the  effect  of  the  size 
of  the  knowledge  base  on  response  time  for  queries  which  require  inference  depths  ranging  from  0  to  10. 
Queries  used  were  not  randomly  generated.  The  knowledge  base  used  was  not  structured. 

at  random,  and  did  not  contain  i4<a  relationships  or  rules  with  special  conditions.  The  inference  path  for 
a  given  query  was  tailored  to  ensure  a  reasonable  branching  factor — at  least  one  of  the  predicates  in  the 
activation  frontier  had  five  or  more  outgoing  links  originating  from  it. 

Based  on  these  and  other  experiments,  and  on  the  design  of  shruti-cm2,  we  can  summarize  the  charac¬ 
teristics  of  the  system: 

•  The  response  time  is  approximately  linear  with  respect  to  the  size  of  the  knowledge  base,  for  knowledge 
bases  with  up  to  160,000  elements.  Thus,  as  the  size  of  the  knowledge  base  increased,  query  answering 
time  increased  proportionately.  This  is  to  be  expected  since  more  predicates  would  be  active  on  the 
average  and  would  entail  proportionately  more  processing  tmd  interprocessor  communication  as  the 
size  of  the  knowledge  base  increases. 

Beyond  a  certain  limit,  we  expect  response  time  to  increase  steeply  with  the  size  of  the  knowledge 
base.  However,  effort  was  not  expended  in  locating  this  limit  or  studying  the  characteristics  of  the 
system  near  this  threshold  since  our  focus  shifted  to  the  CM-5.  As  a  result,  all  timing  results  stated 
here  apply  only  to  knowledge  bases  with  up  to  160,000  rules  and  facts. 

•  Time  taken  to  answer  a  query  increases  as  the  average  branching  factor  of  the  knowledge  base  increases. 
This  again  is  caused  by  increased  processing  and  interprocessor  communication. 

•  Increasing  inference  depth  needed  to  answer  a  query  proportionately  increases  response  time.  Every 
extra  inference  step  requires  an  extra  activation  propagation  step  (i.e.,  an  extra  iteration  of  the  loop 
in  Figure  8). 

•  Response  time  is  approximately  inversely  proportional  to  the  number  of  (physical)  processing  elements 
on  the  machine.  This  can  be  attributed  to  the  increased  computing  power  and  the  lower  “density” 
(with  fewer  knowledge  base  elements  per  processor)  which  results  in  enhanced  parallelism. 
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•  The  time  taken  to  answer  a  query  ranges  from  a  fraction  of  a  second  to  a  few  tens  of  seconds. 

•  An  inherent  problem  with  the  use  of  parallel  variables  on  the  CM-2  is  inefficient  memory  usage.  Since 
the  number  of  virtual  processors  must  always  be  a  power  of  two,  this  could  potentially  lead  to  significant 
waste  of  memory.  There  appears  to  be  no  simple  solution  to  this  problem  without  breaking  out  of 
SIMD  operation.  SPMD  implementations  on  the  CM-5  avoid  this  problem  entirely. 

•  The  maximum  size  of  the  knowledge  base  that  can  be  encoded  on  a  machine  depends  on  the  total 
amount  of  memory  available  on  the  machine.  In  addition,  with  increasingly  large  knowledge  bases,  the 
communication  bottleneck  would  also  significantly  slow  down  the  system. 

B  Multiple  Instantiation — Some  Technical  Details 

Multiple  instantiation  in  both  shruti-cm2  and  shruti-cm5  is  handled  without  the  use  of  switches  (Mani 
and  Shastri,  1993).  When  spreading  activation  in  the  network,  predicate  and  concept  banks  are  considered 
one  at  a  time.  In  other  words,  in  a  given  iteration  of  the  activation  propagation  loop  (see  Figure  8)  only  one 
active  bank  of  a  predicate  or  concept  will  be  considered.  This  technique  could  cause  indefinite  waits  in  the 
rule  base.  To  illustrate  the  problem,  suppose  we  are  currently  considering  bank  t  of  predicate  P.  Let  P  be 
the  consequent  of  rules  ri  and  rj.  Let  Ri  and  A}  be  rule-structures  that  represent  ri  and  rj.  At  propagation 
step  t,  suppose  ri  fires  and  rj  does  not.  The  fact  that  ri  fired  for  bank  i  of  P  will  be  noted  in  Ri,  and  Ri 
can  shift  its  focus  to  the  next  active  bank  t  +  1  in  the  next  propagation  step.  Since  did  not  fire,  Aj  is 
stuck  at  bank  t.  As  cannot  skip  bank  t  and  go  on  to  bank  t  -t-  1  since  rj  could  fire  later  due  to  activation 
propagating  in  the  type  hierarchy.  We  circumvent  this  problem  by  defining  special  protocols. 

Note  that  this  problem  does  not  arise  in  the  type  hierarchy  since  all  ts-a  links  originating  at  a  concept 
always  fire — unlike  a  rule,  no  preconditions  need  to  be  satisfied  for  an  is-a  link  to  fire. 

Shruti-cm2 

Let  Dth  be  the  depth  of  the  type  hierarchy.  Then, 

•  If  a  rule  r  for  bank  i  of  some  predicate  fires  at  time  step  t,  then  update  A,  the  structure  representing  r, 
to  consider  bank  i  + 1  of  the  corresponding  predicate  in  step  t  1  (subject  to  the  conditions  mentioned 
below). 

•  If  a  rule  r  for  bank  t  of  some  predicate  does  not  fire  at  time  step  t,  then  two  cases  are  possible; 

1.  If  t  <  Dth<  then  do  not  update  A.  Thus,  bank  i  will  be  reconsidered  in  step  t  +  1. 

2.  If  t  >  Dth,  update  A  to  consider  bank  t  -i- 1  in  the  next  time  step.^ 

Since  activation  spread  in  the  type  hieruchy  will  not  activate  any  new  concepts  after  Dth  time  steps,  this 
scheme  ensures  that  all  banks  of  a  predicate  will  eventually  be  considered. 

Shruti-cm5 

In  shruti-cmS,  the  multiple  instantiation  indefinite  wait  problem  is  handled  by  placing  special  elements  on 
the  rule-frontier.  Normally,  a  rule-frontier  element  is  a  (consequent)  predicate,  along  with  the  bank  that  was 
instantiated.  All  rules  for  that  predicate  bank  are  considered  in  a  given  propagation  step.  If  any  rule  does 
not  fire  for  this  bank,  then  a  special  pair  of  elements  is  added  to  the  rule-frontier.  This  pair  specifies  the 

*  Whenever  any  nile-ilot  R  is  updated  to  consider  an  inactive  predicate  bank,  R  waits  till  an  instance  beta  been  assigned  to 
that  bank. 
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Figure  20:  Shruti-CM2  running  on  a  CM-5  with  64  proceaaors.  The  processing  nodes  on  the  CM-5  are  used 
in  SIMD  mode.  The  graph  shows  the  effect  of  the  size  of  the  knowledge  base  on  response  time  for  queries 
which  require  inference  depths  ranging  from  0  to  10.  Queries  were  not  randomly  generated.  The  knowledge 
base  used  was  not  structured. 


predicate  bank  and  the  associated  rule  that  need  to  be  reconsidered  in  the  next  propagation  step.  Whenever 
such  a  pair  is  encountered  on  the  rule-frontier,  only  the  specified  rule  is  processed.  If  subsequent  banks  of 
the  predicate  become  active,  these  predicate  banks  will  be  placed  on  the  frontier  as  usual,  irrespective  of  the 
fact  that  previous  banks  could  have  rules  which  have  not  yet  fired. 


C  Shruti-cm2  on  the  CM-5 

In  this  section,  we  briefly  evaluate  shruti-cm2  running  on  the  CM-5.  Since  shruti-cm2  is  written  in  C*, 
and  a  C«  compiler  is  available  for  the  CM-5,  SHRUTI-CM2  was  recompiled  and  run  on  the  CM-5.  Shruti-cm2 
running  on  the  CM-5  uses  the  CM-5  in  data-parallel  (SIMD)  mode.  Figure  20  summarizes  the  results.  Com¬ 
paring  with  Figures  18  and  19,  we  observe  that  the  performance  of  SHRUTI-CM2  on  the  CM-5  is  comparable 
to  that  on  the  CM-2^°,  though  message  passing  on  the  CM-5  appears  to  be  more  robust. 


*°The  rule  of  thumb  teettu  to  be  that  a  32  node  CM-S  is  approximately  equivalent  to  a  CM-2  with  8K  processing  elements. 
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D  Input  Syntax  for  Rules,  Facts  and  Queries 

To  illustrate  the  input  syntax  for  rules,  facts  and  ts-a  relations,  we  begin  with  an  extension  of  the  example 
in  Section  6.7. 


/*  RULES  */ 

forall  x,y,z  [giva(x,y,z)  =>  ovn(y.z)]: 
iorall  x.y  Cbuyfz.y)  =>  onfz.y)]; 
forall  x,y  Coni(x,y}  =>  can_8«ll(x,y)] ; 

forall  x.y  CsiblingCx.y)  k  bom.togathar(x,y)  =>  tvinaCx.y)]; 

forall  x.y  Cpreya_on(x.y)  ->  acarad_of (y.x)] ; 

forall  x.y.z  CmovaCx.y.z)  =>  praaantfx.z.t)] ; 

forall  x.y.z  Cmova(x.y.z)  *>  praaant(x.y.t)] ; 

forall  x.y  axiati  t 

Cbom(x.y)  =>  praaant(x.y.t)] ; 
forall  xrAninata.  y:Solid_obj 

Cvalk_into(x.y)  =>  hurtCx)] ; 


/*  FACTS  */ 

give  (John.  Mary.  Bookl); 
giva  (x.  Suaan.  Ball2); 
forall  x:Cat.  y:Bird  praya.on  (x.y); 
axiata  x;Robin  CosnCMary.x)] ; 

/a  IS-A  FACTS  •/ 
ia-a  (Bird.Aniaal); 
ia-a  (Cat.Aniaal) ; 
ia-a  (Robin. Bird) : 
ia-a  (Canary. Bird) ; 
ia-a  (Tvaaty. Canary ) : 
ia-a  (Sylraatar.Cat) . 

Note:  Any  text  included  between  /’s  are  comments.  The  comments  given  above  are  enclosed  between  /• 
...  a/  so  that  they  look  identical  to  comments  in  C  code. 

The  above  example  illustrates  the  input  syntax  accepted  by  the  parallel  rapid  reasoning  systems.  Most 
of  the  features  are  self-evident.  Some  points  to  be  noted  regarding  the  input  syntax  follow.  Items  prefixed 
by  a  dagger  (t)  are  supported  only  by  shruti-cmS. 

a  A  rule  meant  for  the  backward  reasoner  is  said  to  be  balanced  if  the  following  conditions  are  satisfied: 

-  Repeated  variables  in  the  antecedent  are  also  present  in  the  consequent. 

-  Typed  variables,  existential  variables  and  entities  present  in  the  ruitecedent  are  also  present  in 
the  consequent. 

Only  balanced  rules  will  be  accepted  by  the  system.  Rules  which  do  not  satisfy  the  above  conditions 
will  be  rejected.  A  warning  message  to  this  effect  will  be  printed. 

•  Any  variable  (used  in  a  rule)  which  is  not  listed  it  either  the  list  of  universally  quantified  variables  or 
in  the  list  of  existentially  quantified  variables  is  assumed  to  be  existentially  quantified. 

•  Any  ntune  beginning  with  an  uppercase  alphabetic  character  is  assumed  to  be  an  entity.  All  names 
beginning  with  lowercase  are  variable  names.  Names  of  predicates  can  begin  with  either  uppercase 
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or  lowercase  letters.  Capitalization  of  names  should  be  consistently  used  —  for  example,  naasl  and 
laasl  would  represent  two  different  predicates;  similarly,  Const.a  and  Const.A  are  different  entities. 

•  A  semicolon  ( ; )  indicates  that  a  rule,  fact  or  ts-a  fact  has  been  entered;  it  also  indicates  that  more 
input  is  to  follow.  The  occurrence  of  a  period  ( . )  in  the  input  indicates  the  end  of  a  rule,  fact  or 
ts-a  fact  and  also  terminates  the  input.  A  (quantified  or  unquantified)  predicate  terminated  by  a  ?  is 
interpreted  as  a  query. 

•  The  lexical  analyzer  removes  all  whitespace;  the  input  is  therefore  unaffected  by  the  addition  of  extra 
blanks,  tabs  or  newlines.  Further,  spaces  can  be  omitted  wherever  it  is  not  essential^ ^ 

•  The  lexical  analyzer  also  removes  all  comments.  Any  text  enclosed  between  /’s  (/  . . .  /)  is  a  com¬ 

ment.  The  text  of  a  comment  can  contain  any  character  or  symbol  except  /.  A  comment  can  start 
and  end  at  any  point  in  the  input.  In  particular,  a  comment  may  span  several  lines  or  may  be  limited 
to  part  of  a  single  input  line. 

•  ^Tags.  Predicates  and  entities  can  be  tagged  (with  a  non-zero,  positive  integer)  by  using  the  <  > 
construct:  <giTs(x,y,z)  ,3>or  <Mary,6>.  Tags  can  be  used  to  group  “similar”  predicates  and  entities 
together. 

•  Error  Handling.  When  syntax  errors  are  detected  in  the  input,  the  action  taken  depends  on  the 
mode  of  input: 

-  If  input  is  being  read  from  the  terminal  (stdin),  an  error  message  is  issued,  and  the  last  rule  or 
fact  should  be  re-entered  after  typing  one  or  more  semi-colons  (;). 

-  If  input  is  being  read  from  a  hie,  the  parser  prints  the  line  number  containing  the  syntax  error 
and  continues  reading  the  hie,  so  that  all  syntax  errors  in  the  hie  are  listed.  Rules  or  facts  in  the 
input  that  were  correctly  recognized  (i.e.,  had  no  syntax  error)  will  be  encoded;  the  others  will 
be  ignored. 

Below  is  the  formal  grammar  for  the  input  language  (for  rules,  facts,  ts-a  relations  and  queries)  which 
specihes  the  exact  form  of  each  input  structure.  The  grammttr  is  accurate  for  SHRUTI-CM5.  Though  most 
of  the  constructs  ue  identical  in  shruti-Cm2,  there  are  some  minor  differences.  Further,  SHRUTI-CM2  does 
not  support  tags. 


input  — ►  . 

/*  stop  -  no  more  input  */ 

1  ;  input 

1  input-item  input 

/*  continue  -  more  input  */ 

input-item  — >  query 

/•  query  */ 

1  fact 

/*  fact  •/ 

1  rule 

/•  rule  */ 

1  tag-def 

rule  -*  q-prefix  [  pred-list  =>  predicate  ] 

1  pred-list  =>  predicate 
fact  — •  predicate 

1  q-pred 
query  — »  predicate  ? 

1  q-pred  ? 

tag-def  — *  <  predicate  ,  NUM  > 

1  <  constant  ,  NUM  > 
q-pred  — >  q-prefix  [  predicate  ] 

/*  tag  definition  */ 

‘  ‘  To  distinguish  between  the  variable  ‘f  orallx’  and  'f  orall  z',  a  space  is  tittnlial.  But  a  space  is  not  required  after  the  ‘ , ' 
in  ‘osa(z  ,7)'.  In  general,  spaces  are  not  essential  before  and  after  punctuation  symbols. 
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q-prefix  — ►  FORALL  type-list 
I  EXISTS  type-list 

I  FORALL  type-list  EXISTS  type-list 

I  EXISTS  type-list  FORALL  type-list  * 

type-list  —  variable 

I  variable  :  constant 
I  variable  .  type-list 
I  variable  :  constant  ,  type-list 
pred-list  — »  predicate  Si  pred-list 
I  predicate 

predicate  — »  arg-or-pred  (  arg-list ) 

I  arg-or-pred  (  ) 
arg-list  — »  arg-or-pred  ,  arg-list 
I  arg-or-pred 

arg-or-pred  — >  constant  |  variable 
constant  — ►  CONST 
variable  -*  VAR 

Here,  CONST  represents  entities  (any  token  starting  with  an  uppercase  letter),  VAR  are  variables  (quantified 
or  unquantified)  in  the  rules,  facts  or  queries  and  are  tokens  beginning  with  lowercase  letters.  The  variable 
and  entity  tokens  are  represented  by  a  sequence  of  alphanumeric  characters  along  with  _  amd  «.  Any  integer 
is  recognized  as  a  NUM.  The  tokens  FORALL  and  EXISTS  are  recognized  when  the  input  contains  these 
words,  spelled  with  any  combination  of  uppercase  and  lowercase  letters  (i.e.,  arbitrarily  capitalized). 
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E  Shruti-cm  Commands 


Commands  recognized  by  shruti-cm2  and  shruti-cm5  are  listed  below.  Some  of  the  commands  and  de¬ 
scriptions  are  applicable  only  to  shruti-cm5  and  are  prefixed  by  a  dagger  (f).  The  SHRUTI-CMS  preprocessor 
only  supports  the  commands  i,  v  and  q.  Each  command  is  invoked  by  using  a  single  character.  The  first 
non-blank  character  typed  at  the  input  prompt  is  taken  to  be  the  command.  Any  non-blank  text  following 
the  first  character  forms  the  argument(s)  for  the  command.  The  list  below  indicates  the  purpose  of  the 
command,  the  command  syntauc  and  a  brief  description  of  the  command. 

Quit  Syntax:  q 

Terminates  the  SHRUTl-CM  program. 

Help  Syntax:  ? 

Prints  out  a  list  of  available  commands  and  the  command-line  options  and/or  arguments  which  the 
commands  accept. 

Read  Input  Syntax:  1  C  -1  I  -b  ]  Cinput-flls] 

Reads  input  from  the  terminal  (when  input-file  is  not  specified)  or  a  file  (when  input-fils  is 
specified).  The  -b  option  is  used  to  build  a  backward  reasoning  system  (default),  while  the  -f  option 
builds  a  forward  reasoning  system  (currently  unsupported). 

^In  SHRUTI-CM5  the  behavior  of  this  command  is  dictated  by  the  current  input  mode.  The  system 
always  starts  up  in  parallel  asynchronous  mode;  the  mode  can  be  changed  using  the  ■  command.  In  par¬ 
allel  asynchronous  mode,  each  processor  in  the  partition  processes  a  different  input  file  input-f  11s .  pld 
where  pid  is  a  three  digit  processor  index  (prefixed  by  zeros  if  necessary).  In  global  synchronous  mode, 
all  processors  cooperatively  process  the  same  input  file  input-f  11s. 

^Syntax:  1  C-h  hash-tabls-flls]  C  -f  I  -b  3  [input-fils] 

The  -h  option  for  read  input  is  supported  by  the  shruti-cm5  preprocessor  and  can  be  used  to  update 
the  internal  server  hash  tables  which  store  processor  assignment  and  other  details  for  predicates  and 
concepts.  This  feature  is  useful  for  incremental  preprocessing  of  luge  knowledge  bases. 

^Change  Input  Mode  Syntax:  a  [  -p  I  -g  3 

Changes  input  mode  to  parallel  asynchronous  (with  the  -p  option)  or  to  serial,  global  synchronous 
(with  the  -g  option).  Without  any  option,  this  commtuid  prints  out  the  current  input  mode.  The 
current  input  mode  dictates  the  behavior  of  the  i  command. 

^Write  Out  Hash  Table  Syntax:  v  [-o  output-fils-prsf  1x3 

Writes  out  the  current  server  hash  tables  to  the  specified  file  (with  a  .bashtablss  extension).  If  no 
output  file  prefix  is  given,  kb.pp  is  used  as  default.  The  hash  tables  written  out  cem  be  read  by  the 
preprocessor  (using  the  1  command  with  the  -b  option)  and  supports  incremental  preprocessing  of 
large  knowledge  bases. 

^Syntax:  w  C  -g  3  C~o  output-fils-prsfix] 

This  command,  when  used  on  the  SHRUTI-CMfi  preprocessor,  writes  out  the  preprocessed  knowledge 
base.  The  output  file  names  are  suffixed  with  the  processor  number.  If  the  output  file  prefix  is  not 
specified,  kb.pp  is  used  as  the  default.  If  the  -g  option  is  absent,  the  inference  dependency  graph  for 
the  knowledge  base  is  also  written  out  (with  file  extension  .  idg) 

Run  Reasoning  Episode  Syntax:  r  [  [-f  ]  tstsps] 

Runs  the  reasoning  episode  after  a  query  has  been  posed.  It  is  an  error  to  invoke  this  command  when 
a  query  has  not  been  posed.  Without  any  options  or  arguments,  r  runs  the  reasoning  episode  to 
completion — till  the  query  is  answered  or  the  reasoning  episode  has  proceeded  long  enough  to  conclude 
that  there  will  be  no  answer.  When  tstsps  is  specified  with  the  -f  option,  the  reasoning  episode  is 
forced  to  run  for  tstsps  propagation  steps  (irrespective  of  whether  the  query  has  been  answered  or 
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not).  If  the  -t  option  is  not  specified,  the  reasoning  episode  terminates  either  after  fataps  cycles  or 
edter  the  query  has  been  answered,  whichever  happens  first. 

^Since  shruti-cm5  runs  reasoning  episodes  asynchronously,  this  command  does  not  support  the  ~t 
and/or  tataps  arguments. 

Reset  Network  Syntax:  z  [  -q  I  -v  ] 

Resets  the  network  and  removes  all  activation  including  the  query.  With  the  -v  option,  a  message  is 
printed  out  indicating  that  the  network  has  been  reset  (default).  The  message  can  be  suppressed  by 
using  the  -q  option. 

Set  Phases  Syntax:  p  C#phasaa] 

Sets  the  number  of  phases  per  clock  cycle  to  tphaaas.  The  current  number  of  phases  is  printed  out  if 
the  command  is  invoked  without  an  argument. 

Display  Syntax:  d  {  -p  I  -e  }  naaa 

Displays  the  current  instantiations  of  the  predicate  (with  the  -p  option)  or  concept  (with  the  -c  option) 
specified  by  naaa.  An  error  message  is  printed  if  the  named  predicate  or  concept  is  not  present  in  the 
system. 

f Syntax:  d  {  -p  naaa  I  -c  naaa  }« 

Shruti-cm5  supports  multiple  -p  and/or  -c  options. 

Statistics  Syntax:  s  C  -a  I  -k  I  -q  I  -c  I  -s  ] 

Prints  out  knowledge  base  and  reasoning  episode  statistics.  When  the  system  is  configured  for  detailed 
statistics  collection,  this  command  will  print  out  more  information.  The  -a  option  prints  out  all  the 
accumulated  data  (default).  The  >k  option  prints  out  information  about  the  knowledge  base.  All 
details  about  the  current  reasoning  episode  are  printed  out  by  the  -q  option.  The  -c  and  -s  options 
print  out  cumulative  data  and  data  from  the  last  propagation  step  respectively,  for  the  current  query. 

^Due  to  the  asynchronous  nature  of  the  shruti-cm5  system,  a  global  propagation  step  is  not  well 
defined.  Hence,  SHRUTI-CM5  does  not  support  the  -e  and  -s  options. 

^Display  Tagged  Activation  Syntax:  a  -f  first-tag  C-1  last-tag] 

Displays  the  number  of  active  predicates  and  entities  with  tag  values  in  the  specified  rwge.  If  the  -1 
option  is  not  specified,  active  predicates  and  entities  with  tag  value  equal  to  first-tag  are  printed. 

^Display  Processor  Load  Syntax:  1  [  -a  I  -k  I  -q  I  -t  ]  C-n  processor] 

Prints  out  the  processor  load  for  the  current  reasoning  episode.  When  the  system  is  configured  for 
detailed  statistics  collection,  this  command  will  print  out  more  information.  The  -a  option  prints 
out  all  information  (default).  The  -k  option  prints  out  the  distribution  of  the  knowledge  base  on  the 
processing  elements.  The  distribution  of  active  elements  for  the  current  reasoning  episode  are  printed 
out  by  the  -q  option.  The  timing  for  individual  processors  (for  the  current  reasoning  episode)  is 
displayed  by  the  -t  option.  If  the  -n  option  is  given,  required  information  is  displayed  for  the  specified 
processor.  If  the  -n  option  is  not  used,  data  is  displayed  for  all  processors  in  the  partition. 
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Abstract 

Polynomial  time  complexity  is  the  usual 
‘threshold’  for  distinguishing  the  tractable  from 
the  intractable  and  it  may  seem  reasonable  to 
adopt  this  notion  of  tractability  in  the  context 
of  knowledge  representation  and  reasoning.  It 
is  argued  that  doing  so  may  be  inappropriate  in 
the  context  of  common  sense  reasoning  under¬ 
lying  language  understanding.  A  more  strin¬ 
gent  criteria  of  tractability  is  proposed.  A  re¬ 
sult  about  reasoning  that  is  tractable  in  this 
stronger  sense  is  outlined.  Some  unusual  prop¬ 
erties  of  tractable  reasoning  emerge  when  the 
formal  specification  is  grounded  in  a  neurally 
plausible  architecture. 

1  Introduction 

Understanding  language  is  a  complex  task.  It  involves 
among  other  things,  carrying  out  inferences  in  order  to 
establish  referential  and  causal  coherence,  generate  ex¬ 
pectations,  and  make  predictions.  Nevertheless  we  can 
understand  language  at  the  rate  of  several  hundred  words 
per  minute  [Carpenter  and  Just,  1977].  This  rapid  rate 
of  language  understanding  suggests  that  we  can  (and 
do)  perform  a  wide  range  of  inferences  very  rapidly,  au¬ 
tomatically  and  without  conscious  effort  —  as  though 
they  are  a  reflex  response  of  our  cognitive  apparatus.  In 
view  of  this  such  reasoning  may  be  described  as  reflexive 
[Shastri,  1991]. 

As  an  example  of  reflexive  reasoning  consider  the  sen¬ 
tence  'John  seems  to  have  suicidal  tendencies,  he  has 
joined  the  Columbian  drug  enforcement  agency.’  We 
can  understand  this  sentence  spontaneously  and  without 
any  deliberate  effort  even  though,  doing  so  involves  the 
use  of  background  knowledge  and  reasoning.  Informally, 
this  reasoning  may  be  as  follows:  joining  the  Columbian 
drug  enforcement  agency  has  dangerous  consequences, 
and  as  John  may  be  aware  of  this,  his  decision  to  join 
the  agency  suggests  that  he  has  suicidal  tendencies.  As 
another  example  of  reflexive  reasoning  consider  the  in¬ 
ference  ‘John  owns  a  car’  upon  hearing  ‘John  bought 
a  Rolls-Royce’.  We  can  make  this  inference  effortlessly 

*This  work  was  supported  by  NSF  grants  IRl  88-05465 
and  the  ARO  grant  DAAL  03-89-C-0031. 


even  though  it  requires  multiple  steps  of  inference  using 
background  knowledge  such  as  Rolls-Royce  is  a  car  and 
if  X  buys  y  then  x  owns  y. 

Not  all  reasoning  is,  and  as  complexity  theory  tells 
us,  cannot  be,  reflexive.  We  contrast  reflexive  reason¬ 
ing  with  reflective  reasoning  —  reasoning  that  requires 
reflection,  conscious  deliberation,  and  at  times,  the  use 
of  external  props  such  as  paper  and  pencil  (e.g.,  solv¬ 
ing  logic  puzzles,  doing  cryptarithmetic,  or  planning  a 
vacation). 

2  Reflexive  reasoning  necessitates  a 
strong  notion  of  tractability 

In  order  to  quantify  the  notion  of  reflexive  reasoning  in¬ 
troduced  above,  let  us  make  a  few  observations  about 
such  reasoning. 

•  Reflexive  reasoning  occurs  with  respect  to  a  large 
body  of  background  knowledge.  A  serious  attempt 
at  compiling  common  sense  knowledge  suggests  that 
our  background  knowledge  base  may  contain  as 
many  as  10^  to  10®  items  [Guha  and  Lenat,  1990]. 
This  should  not  be  very  surprising  given  that  this 
knowledge  includes,  besides  other  things,  our  knowl¬ 
edge  of  naive  physics  and  naive  psychology;  facts 
about  ourselves,  our  family,  friends,  colleagues,  his¬ 
tory  and  geography;  our  knowledge  of  artifacts, 
sports,  art,  music;  some  basic  principles  of  science 
and  mathematics;  and  our  models  of  social,  civic, 
and  political  interactions. 

•  Items  in  the  background  knowledge  base  are  fairly 
stable  and  persist  for  a  long-time  once  they  are 
acquired.  Hence  this  knowledge  is  best  described 
as  long-term  knowledge  and  we  will  refer  to  this 
body  of  knowledge  as  the  long-term  knowledge  base 
(LTKB). 

•  Episodes  of  reflexive  reasoning  are  triggered  by 
‘small’  inputs.  In  the  context  of  language  under¬ 
standing,  an  input  (typically)  corresponds  to  a  sen¬ 
tence  that  would  map  into  a  small  number  of  as¬ 
sertions.  For  example,  the  input  ‘John  bought  a 
Rolls  Royce’  maps  into  just  one  assertion  (or  a  few, 
depending  on  the  underlying  representation).  The 
critical  observation  is  that  the  size  of  the  input,  |/ti1, 


id  insignificant  compared  to  the  size  of  the  long-term 
knowledge  base  \LTKB\.^  * 

•  The  vast  difference  in  the  magnitude  of  \LTKB\ 
(about  10^)  and  |/n|  (a  few)  becomes  crucial  when 
analyzing  the  tractability  of  common  sense  reason¬ 
ing.  Given  the  actual  values  of  |/n|  that  occur  dur¬ 
ing  common  sense  reasoning,  there  is  a  distinct  pos¬ 
sibility  that  the  overall  cost  of  a  derivation  may  be 
dominated  by  the  “fixed”  contribution  of  \LTKB\. 
Thus  we  cannot  ignore  the  cost  attributable  to 
\LTKB\  and  we  must  analyze  the  complexity  of  rea¬ 
soning  in  terms  of\LTKB\  as  well  as  |/n|. 

In  view  of  the  magnitude  of  \LTKB\,  even  a  cursory 
Jinalysis  suggests  that  any  inference  procedure  whose 
time  complexity  is  quadratic  or  worse  in  \LTKB\  cannot 
provide  a  plausible  computational  account  of  reflexive 
reasoning.  However,  a  process  that  is  polynomial  in  |/n| 
remains  viable. 

2.1  Time  complexity  of  reflexive  reasoning 

Observe  that  although  the  size  of  a  person’s  \LTKB\ 
increases  considerably  from,  say,  age  seven  to  thirty, 
the  time  taken  by  a  person  to  understand  natural  lan¬ 
guage  does  not.  This  suggests  that  the  time  taken  by 
an  episode  of  reflexive  reasoning  does  not  depend  on  the 
\LTKB\.  In  view  of  this  it  is  proposed  that  a  realistic 
criteria  of  tractability  for  reflexive  reasoning  is  one  where 
the  time  taken  by  an  episode  of  reflexive  reasoning  is  in¬ 
dependent  of\LTKB\  and  only  depends  on  the  depth  of 
the  derivation  tree  associated  with  the  inference.^ 

2.2  Space  complexity  of  reflexive  reasoning 

The  expected  size  of  the  LTKB  also  rules  out  any  com¬ 
putational  scheme  whose  space  requirement  is  quadratic 
(or  higher)  in  the  size  of  the  KB.  For  example,  the  brain 
has  only  about  10*^  cells  most  of  which  are  involved 
in  processing  of  sensorimotor  information.  Hence  even 
a  linear  space  requirement  is  fairly  generous  and  leaves 
room  only  for  a  modest  ‘constant  of  proportionality’.  In 
view  of  this,  it  is  proposed  that  the  admissible  space  re¬ 
quirement  of  a  model  of  reflexive  reasoning  be  no  more 
than  linear  in  \LTKB\. 

To  summarize,  it  is  proposed  that  as  far  as  (reflex¬ 
ive)  reasoning  underlying  language  understanding  is  con- 

‘A  small  input  may,  however,  lead  to  a  potentially  large 
number  of  elaborate  inferences.  For  example,  the  input  ‘John 
bought  a  Rolls-Royce’  may  generate  a  number  of  reflexive 
inferences  such  as  ‘John  bought  a  car’,  ‘John  owns  a  car’, 
‘John  has  a  driver’s  license’,  ‘John  is  perhaps  a  wealthy  man’, 
etc. 

*Some  of  these  inferences  may  be  ‘soft’  inferences,  but  the 
issue  of  deductive  versus  evidential  nature  of  inferences  is 
irrelevant  to  our  current  concerns. 

®The  restriction  that  the  reasoning  time  be  independent 
of  \LTKB\  may  seem  overly  strong  and  one  might  argue  that 
perhaps  logarithmic  time  may  be  acceptable.  Our  belief  that 
the  stronger  notion  of  effectiveness  is  relevant,  however,  is 
borne  out  by  results  which  demonstrate  that  there  does  ex¬ 
ists  a  class  of  reasoning  that  can  be  performed  in  time  inde¬ 
pendent  of  \LTKB\. 


cerned,  the  appropriate  notion  of  tractability  is  one 
where 

•  the  reasoning  time  is  independent  of  \LTKB\  and  is 
only  dependent  on  |/n|  and  the  depth  of  the  deriva¬ 
tion  tree  associated  with  the  inference,  and 

•  the  associated  space  requirement,  i.e.,  the  space 
required  to  encode  the  LTKB  plus  the  space  re¬ 
quired  to  hold  the  working  memory  during  reason¬ 
ing  should  be  no  worse  than  linear  in  \LTKB\. 

In  spite  of  the  apparent  significance  of  reflexive  rea¬ 
soning  there  have  been  very  few  attempts  at  develop¬ 
ing  a  computational  account  of  such  inference.  Some 
past  exceptions  being  Fahlman’s  work  on  NETL  [1979] 
and  Shastri’s  work  on  a  connectionist  semantic  mem¬ 
ory  [1988].  However  these  models  dealt  primarily  with 
inheritance  and  classification  within  an  IS-A  hierarchy. 
Holldobler  [l990]  and  Ullman  and  van  Gelder  [1988]  have 
proposed  parallel  systems  for  performing  quite  complex 
logical  inferences,  however,  both  these  systems  have  un¬ 
realistic  space  requirements.  The  number  of  nodes  in 
Hblldobler’s  system  is  quadratic  in  the  the  size  of  the 
knowledge  base  (KB)  the  number  of  processors  required 
by  Ullman  and  van  Gelder  is  even  higher.  Ullman  and 
van  Gelder  treat  the  number  of  nodes  required  to  encode 
the  background  KB  as  a  fixed  cost,  and  hence,  do  not  re¬ 
fer  to  its  size  in  computing  the  space  complexity  of  their 
system.  If  the  size  of  such  a  KB  is  taken  into  account, 
the  number  of  processors  required  by  their  system  turns 
out  to  be  a  high  degree  polynomial. 

A  significant  amount  of  work  has  been  done  by  re¬ 
searchers  in  knowledge  representation  and  reasoning  to 
identify  classes  of  limited  inference  that  can  be  per¬ 
formed  efficiently  (e.g.,  see  [Frisch  and  Allen,  1982]; 
Brachman  and  Levesque,  1984];  [Patel-Schneider,  1985]; 
Dowling  and  Gallier,  1984];  [Levesque,  1988];  [Selman 
and  Levesque,  1989];  [McAllester,  1990];  [Bylander  el 
al.,  1991];  [Kautz  and  Selman,  1991]).  This  work  has 
covered  a  wide  band  of  the  complexity  spectrum  but 
none  that  meets  the  strong  tractability  requirement  dis¬ 
cussed  above.  Most  results  stipulate  polynomial  time 
complexity,  restrict  inference  in  implausible  ways  (e.g., 
by  excluding  chaining  of  rules),  and/or  deal  with  limited 
expressiveness  (e.g.,  deal  only  with  propositions). 

3  A  tractable  reasoning  class 

Below  we  describe  a  class  of  reasoning  that  is  tractable 
with  reference  to  the  criteria  stated  above.  The  charac¬ 
terization  of  such  a  class  is  different  (but  analogous)  for 
forward  and  backward  reasoning.  In  this  paper  we  will 
focus  on  backward  reasoning. 

Some  definitions: 

Let  us  define  rules  to  be  first-order  sentences  of  the  form: 

[Pi{...)  A  P2{...)...A  Pn{-)  =>  Bzi,  -  <3(...)] 

where  the  arguments  of  P,’s  are  elements  of  {ii,  ...Xm}, 
and  an  argument  ofQ  is  either  an  element  of  {xi,  ...x^}, 
an  element  of  {zi,...zi},  or  a  constant.  □ 

Any  variable  that  occurs  in  multiple  argument  positions 
in  the  antecedent  of  a  rule  is  a  pivotal  variable.  □ 


Note  that  the  notion  of  a  pivotal  variable  is  local  to  a 
rule. 

A  rule  is  balanced  if  all  pivotal  variables  occurring  in  the 
rule  also  appear  in  its  consequent.  □ 

For  example,  the  rule  'ix,y,z  P(x,y)  A  R{x,z)  ^ 
S{y,z)  is  not  balanced  because  the  pivotal  variable  x 
does  not  occur  in  the  consequent.  On  the  other  hand, 
the  rule  Vx,  y,  z  P{x,  y)  A  R{x,  r)  =>  5(i,  z)  is  balanced 
because  the  pivotal  variable  x  does  occur  in  the  conse¬ 
quent.  The  fact  that  y  does  not  appear  in  the  conse¬ 
quent  is  immaterial  because  y  occurs  only  once  in  the 
antecedent  and  hence,  is  not  a  pivotal  variable. 

Facts  are  partial  or  complete  instantiations  of  predicates. 
Thus  facts  are  atomic  formulae  of  the  form  P(ti,t2  -tk) 
where  U’s  are  either  constants  or  distinct  existentially 
quantified  variables.  □ 

Queries  have  the  same  form  as  facts.  Let  us  distinguish 
between  yes-no  queries  and  tuA-queries.  A  query,  all  of 
whose  arguments  are  bound  to  constants  corresponds  to 
the  yes-no  query:  ‘Does  the  query  follow  from  the  rules 
and  facts  encoded  in  the  long-term  memory  of  the  sys¬ 
tem?’  A  query  with  existentially  quantified  variables, 
however,  has  several  interpretations.  For  example,  the 
query  P{a,  x),  where  a  is  a  constant  and  x  is  an  existen¬ 
tially  quantified  argument,  may  be  viewed  as  the  yes-no 
query;  ‘Does  P(a,  x)  follow  from  the  rules  and  facts  for 
some  value  of  xV  Alternately  this  query  may  be  viewed 
as  the  wA-query:  ‘For  what  values  of  x  does  P(a,x)  fol¬ 
low  from  the  rules  and  facts  in  the  system’s  long-term 
memory?’  □ 

Consider  a  query  Q  and  a  LTKB  consisting  of  facts  and 
balanced  rules.  A  derivation  of  Q  obtained  by  back¬ 
ward  chaining  is  threaded  if  all  pivotal  variables  occur¬ 
ring  in  the  derivation  get  bound  and  their  bindings  can 
be  traced  back  to  the  bindings  introduced  in  Q.  O 

Given  a  LTKB  consisting  of  facts  and  balanced  rules,  a 
reflexive  query  is  one  for  which  there  exists  a  threaded 
proof.  O 

3.1  A  class  of  tractable  reasoning 

The  worst-case  time  for  answering  a  reflexive  yes-no 
query,  Q,  is  proportional  to  V\In\^d,  where: 

•  |7n|  is  the  number  of  distinct  constants  in  Q. 

•  V  is  as  follows:  Let  Vj  be  the  arity  of  the  predicate 
Pi.  Then  V  equals  max(Vi),  i  ranging  over  all  the 
predicates  in  the  LTKB. 

•  d  equals  the  depth  of  the  shallowest  derivation  of  Q 
given  the  LTKB. 

Observe  that  the  worst-case  time  is  i)  independent  of 
\LTKB\,  ii)  polynomial  in  |/n|  and  iii)  only  proportional 
to  d. 

As  observed  in  Section  2,  while  \LTKB\  may  be  as 
much  as  10®,  |7n|  is  simply  the  number  of  (distinct)  ‘en¬ 
tities’  referred  to  in  the  input.  In  the  context  of  natural 
language  understanding,  |7n|  would  be  quite  small  (typ¬ 
ically,  less  than  5).  We  also  expect  V,  the  maximum 
arity  of  predicates  in  the  LTKB  to  be  quite  small. 


An  answer  to  a  wA-query  can  also  be  computed  in  time 
proportional  to  V\In^ d,  except  that  |7n|  now  equals  the 
arity  of  the  query  predicate  Q. 

The  space  requirement  is  linear  in  \LTKB\  and  poly- 
nomietl  in  |7n|.  This  includes  the  cost  of  encoding  the 
LTKB  as  well  as  the  cost  of  meiintaining  the  dynamic 
state  of  the  ‘working  memory’  during  reasoning. 

An  informal  explanation  of  the  result 

The  number  of  times  a  predicate  P  may  get  instantiated 
in  a  threeuled  derivation  of  a  query  cannot  exceed  |7n|^. 
This  follows  from  the  observation  that  P  has  at  most  V 
arguments  and  each  of  these  can  get  bound  to  at  most 
|7n|  distinct  constants.  Since  each  predicate  instantia¬ 
tion  can  contain  at  most  V  bindings,  the  propagation 
of  argument  bindings  from  one  predicate  to  another  can 
be  carried  out  in  time  proportional  to  V'|7n|'^.  This  as¬ 
sumes  that  the  correspondence  (specified  by  the  rules  in 
the  LTKB)  between  the  arguments  of  the  antecedent  and 
consequent  predicates  are  hard-wired. 

It  can  be  shown  that  the  propagation  of  argument 
bindings  from  multiple  predicates  to  a  predicate  can  be 
carried  out  in  parallel  (see  [Mani  and  Shastri,  1992]  for  a 
possible  implementation  of  such  a  parallel  binding  prop¬ 
agation  scheme).  This  means  that  the  time  required  to 
carry  out  one  step  of  a  parallel  breadth-first  derivation 
is  only  proportional  to  V\In\^ .  It  follows  that  the  time 
required  to  carry  out  a  d  step  parallel  derivation  is  pro¬ 
portional  to  V'17n|'^d. 

Lower  bound  nature  of  above  result 

In  general,  derivations  that  involve  unbalanced  rules 
or  those  that  do  not  satisfy  the  threaded  property  can¬ 
not  be  computed  in  time  independent  of  \LTKB\,  if 
the  available  space  is  no  more  than  linear  in  \LTKB\ 
[Dietz  etal.,  1993].  This  result  follows  from  the  obser¬ 
vations  that  i)  the  common-element  problem,  i.e.,  the 
problem  of  determining  whether  two  sets  share  a  com¬ 
mon  element,  can  be  reduced  to  the  problem  of  com¬ 
puting  a  derivation  involving  unbalanced  rules  and/or 
non-threaded  derivations,  ii)  the  sorting  problem  can  be 
reduced  to  the  common-element  problem,  and  iii)  the 
lower  bound  on  the  sorting  problem  is  nlogn  (where  n 
would  corresponds  to  \LTKB\).  Thus  derivations  in¬ 
volving  unbalanced  rules  and  non-threaded  derivations 
may  not  be  computed  in  time  independent  of  |7,T7fB| 
unless  one  makes  use  of  more  than  linear  space. 


3.2  Worst-case  versus  expected  case 

The  above  result  offers  a  worst-case  characterization 
which  assumes  that  during  the  derivation,  all  variables 
will  get  instantiated  with  all  possible  bindings  involving 
constants  in  Q.  This  will  not  be  the  case  in  a  typical 
situation.  In  fact  it  may  be  conjectured  that  in  a  typical 
episode  of  reasoning,  the  actual  time  will  seldom  exceed 
50d  (see  next  section). 


4  A  neurally  motivated  model  of 
tractable  reasoning 

We  have  proposed  a  neurally  plausible  model  (SHRUTI) 
that  can  encode  a  LTKB  of  the  type  described  above, 
together  with  a  term  hierarchy  eind  perform  a  class  of 
forward  as  well  as  backward  reasoning  with  extreme  ef¬ 
ficiency  [Shastri  and  Ajjanagadde,  1990];  [Ajianagadde 
and  Shastri,  1991];  [Mani  and  Shastri,  1991];  [Mani  and 
Shastri,  1992];  [Shastri,  1992].  SHRUTI  can  draw  in¬ 
ferences  in  time  that  is  only  proportional  to  the  depth 
of  the  shallowest  derivation  leading  to  the  conclusion. 
A  SHRUTI  like  model  has  also  been  used  by  Hender¬ 
son  [1992]  to  design  a  parser  for  English.  The  parser’s 
speed  is  independent  of  the  size  of  the  lexicon  and  the 
grammar,  and  it  offers  a  natural  explanation  for  a  va¬ 
riety  of  data  on  long  distance  dependencies  and  center 
embedding. 

If  we  set  aside  SHRUTI’s  ability  to  perform  termino¬ 
logical  reasoning,  the  class  of  reasoning  that  SHRUTI 
can  perform  efficiently  is  a  subclass  of  the  class  of  rea¬ 
soning  specified  in  the  previous  section.  The  additional 
restrictions  placed  on  SHRUTI’s  reasoning  ability  are 
motivated  by  gross  constraints  on  the  speed  at  which 
humans  can  perform  reflexive  reasoning  and  gross  neu¬ 
rophysiological  parameters  such  as: 

!•  T^max,  the  maximum  period  at  which  nodes  can  be 
expected  to  sustain  synchronous  activity, 

2.  w,  the  tolerance  or  the  minimum  lead/lag  that  must 
be  allowed  between  the  spiking  of  two  nodes  that  are 
firing  in  synchrony, 

3.  the  time  it  takes  a  cluster  of  synchronous  nodes  to 
drive  a  connected  cluster  of  nodes  to  fire  in  syn¬ 
chrony. 

The  details  of  the  model  are  beyond  the  scope  of  this 
paper  and  the  reader  is  referred  to  [Shastri  and  Ajjana¬ 
gadde,  1990].  Let  us  however,  state  the  additional  con¬ 
straints  on  the  class  of  reasoning  SHRUTI  can  perform. 

4.1  Additional  constraints  on  the  reasoning 
performed  by  SHRUTI 

SHRUTI  can  encode  a  LTKB  of  facts  and  balanced  rules 
and  answer  yes  to  any  reflexive  yes-no  query  in  time 
proportional  to  the  depth  of  the  shallowest  derivation 
leading  to  a  derivation  of  the  query  provided: 

1.  The  number  of  distinct  constants  specified  in  the 
query  does  not  exceed  ki,  where  ki  is  bounded  by 
^max/u)  (biological  data  suggests  that  ki  is  small, 
perhaps  between  5  and  10). 

The  mode!  suggests  that  as  long  as  the  number  of 
entities  introduced  by  the  query  is  5  or  less,  there 
will  essentially  be  no  cross-talk  among  the  facts  in¬ 
ferred  during  reasoning.  If  more  than  5  entities  oc¬ 
cur,  the  window  of  synchrony  would  have  to  shrink 
appropriately  in  order  to  accommodate  all  the  enti¬ 
ties.  As  this  window  shrinks,  the  possibility  of  cross¬ 
talk  between  bindings  would  increase  until  eventu¬ 
ally,  the  cross-talk  would  become  excessive  and  dis¬ 
rupt  the  system’s  ability  to  perform  systematic  rea¬ 
soning.  The  biological  data  suggests  that  a  neurally 


plausible  upper  bound  on  the  number  of  distinct  enti¬ 
ties  that  can  occur  in  the  reasoning  process  is  about 
10.  Of  course,  these  entities  may  occur  in  multiple 
facts  and  participate  in  a  number  of  inferences. 

It  may  be  significant  that  the  bound  on  the  num¬ 
ber  of  entities  that  may  be  referenced  by  the  ac¬ 
tive  facts  during  a  derivation  relates  well  to  7  ±  2, 
the  robust  measure  of  short-term  memory  capacity 
[Miller,  1956].  Note  however,  that  SHRUTI  does 
not  place  a  small  limit  on  the  number  of  facts  that 
can  be  simultaneously  active  —  indeed  a  very  large 
number  of  facts  can  be  involved  in  each  derivation 
carried  out  by  SHRUTI. 

2.  During  the  processing  of  the  query,  each  predicate 
may  only  be  instantiated  at  most  k2  times. 

Note  that  this  restriction  only  applies  to  run-time 
or  ‘dynamic’  instantiations  of  predicates  and  not  to 
‘long-term’ facts  stored  in  the  system.  As  argued  in 
[Shastri,  1992]  a  plausible  values  of  Arj  is  somewhere 
between  3-5.  Also,  k2  need  not  be  the  same  for 
all  predicates.  The  application  of  a  SHRUTI-like 
model  to  parsing  by  Henderson  also  suggests  that 
a  value  of  ib2  under  3  may  be  sufficient  for  parsing 
English  sentences. 

Some  typical  retrieval  and  inference  timings 

If  we  set  system  parameters  of  SHRUTI  to  some  neurally 
motivated  values,  SHRUTI  demonstrates  that  a  system 
made  up  of  simple  and  slow  neuron-like  elements  can 
perform  a  wide  range  of  inferences  (both  forward,  back¬ 
ward  and  those  involving  a  type  hierarchy)  within  a  few 
hundred  milliseconds. 

If  we  choose  the  period  of  oscillation  of  nodes  to 
be  20  milliseconds,  assume  that  nodes  can  synchronize 
within  two  periods  of  oscillations  and  pick  k2  equal  to  3, 
SHRUTI  takes  320  milliseconds  to  infer  ‘John  is  jealous 
of  Tom’  after  being  given  the  dynamic  facts  ‘John  loves 
Susan’  and  ‘Susan  loves  Tom’  (this  involves  the  rule  ‘if  x 
loves  y  and  y  loves  z  then  x  is  jealous  of  z).  The  system 
takes  260  milliseconds  to  infer  ‘John  is  a  sibling  of  Jack’ 
given  ‘Jack  is  a  sibling  of  John’  (this  involves  the  rule  ‘if 
X  is  a  sibling  of  y  then  y  is  a  sibling  of  i).  Similarly,  the 
system  takes  320  milliseconds  to  infer  ‘Susan  owns  a  car’ 
after  its  internal  state  is  initialized  to  represent  ‘Susan 
bought  a  Rolls-Royce’  (using  the  rule  ‘if  x  buys  y  then  x 
owns  y’  and  the  IS-A  relation,  ‘Rolls-Royce  is  a  car’). 

If  SHRUTI’s  long-term  memory  contains  the  fact 
‘John  bought  a  Rolls-Royce’,  SHRUTI  takes  140  mil¬ 
liseconds,  420  milliseconds,  and  740  milliseconds,  respec¬ 
tively,  to  answer  ‘yes’  to  the  queries  ‘Did  John  buy  a 
Rolls-Royce’,  ‘Does  John  own  a  car?’  and  ‘Can  John 
sell  a  car?’  (the  last  query  also  makes  use  of  the  rule  ‘if 
X  owns  y  then  x  can  sell  y).  Note  that  the  second  and 
third  queries  also  involve  inferences  using  rules  cis  well 
as  IS-A  relations. 

The  above  times  are  independent  of  \LTKB\  and  do 
not  increase  when  additional  rules,  facts,  and  IS-A  re¬ 
lationships  are  added.  If  anything,  these  times  may  de¬ 
crease  if  a  new  rule  is  added  that  leads  to  a  shorter  in¬ 
ference  path. 


5  Conclusion 

We  have  proposed  a  criteria  for  tractable  reasoning  that 
is  appropriate  in  the  context  of  common  sense  reasoning 
underlying  language  understanding.  We  have  suggested 
that  an  appropriate  measure  of  tractability  for  such  rea¬ 
soning  is  one  where  the  time  complexity  is  independent 
of,  and  the  space  complexity  is  no  more  than  linear  in, 
the  size  of  the  long-term  knowledge  base.  We  have  also 
identified  a  class  of  reasoning  that  is  tractable  in  this 
sense.  This  characterization  of  tractability  can  be  fur¬ 
ther  refined  by  cognitive  and  biological  considerations. 
This  work  suggests  that  the  expressiveness  and  the  infer¬ 
ential  ability  of  a  representation  and  reasoning  systems 
may  be  limited  in  unusual  ways  to  arrive  at  extremely  ef¬ 
ficient  yet  fmrly  powerful  knowledge  representation  and 
reasoning  systems. 
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1.  INTRODUCTION 

Artificial  intelligence  (AI)  and  cognitive  science  have  made  considerable  advances  over  the 
past  four  decades,  but  it  is  widely  believed  that  solutions  resulting  from  the  “classical” 
approach  to  these  disciplines  lack  scalability,  gradedness,  robustness,  and  adaptability.  Con¬ 
sider  scalability.  Although  existing  AI  systems  may  perform  credibly  within  restricted  do¬ 
mains  they  do  not  scale  up:  as  the  domain  grows  larger  a  system’s  performance  degrades 
drastically  and  it  can  no  longer  solve  interesting  problems  in  acceptable  time-scales.  Consider 
gradedness.  Research  in  AI  and  cognitive  science  has  made  it  apparent  that  the  solution  of  a 
cognitive  task  emerges  as  a  result  of  rich  context-sensitive  interactions  among  a  large  number 
of  graded  factors.  Classical  models,  rooted  in  the  von  Neumann  architecture,  are  not  suited 
for  articulating  this  view  of  computation  (for  a  discussion  of  robustness  and  adaptability  see 
SYMBOLS  IN  NEURAL  REPRESENTATIONS). 

Research  in  connectionism  is  motivated  by  the  belief  that  in  order  to  address  the  limita¬ 
tions  mentioned  above,  one  must  pay  attention  to  the  computational  characteristics  of  the 
brain.  After  all,  the  brain  is  the  only  existing  physical  system  that  exhibits  the  requisite 
attributes,  and  it  seems  reasonable  to  expect  that  identifying  neurally  motivated  constraints 
—  albeit,  at  an  abstract  computational  level  —  and  incorporating  them  into  our  models 
would  lead  to  novel  and  critical  insights. 

In  addition  to  recognizing  the  importance  of  neurally  motivated  constraints,  the  struc¬ 
tured  connectionist  approach  (Feldman  et  al.,  1988)  also  recognizes  that  a  number  of  insights 
acquired  by  disciplines  such  as  computer  science,  AI,  psychology,  linguistics  and  learning 
theory  will  have  to  be  leveraged  in  developing  solutions  to  difficult  problems  in  AI  and  cog¬ 
nitive  science.  These  insights  pertain  to  recognizing  the  powder  of  problem  decomposition, 
hierarchical  processing,  and  structured  representations:  the  need  for  representational  and 
inferential  adequacy;  and  the  role  of  complexity  analysis. 

A  key  difference  between  the  structured  connectionist  approaches  and  the  so  called  dis- 
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tributed  approach  is  as  follows:  The  fully  distributed  approach  assumes  that  each  “item”  (a 
concept  or  mental  object)  is  represented  as  a  pattern  of  activity  distributed  over  a  common 
pool  of  nodes  (van  Gelder,  1992).  This  notion  of  representation  suffers  from  several  funda¬ 
mental  limitations.  Consider  the  representation  of  ‘John  and  Mary’.  If  ‘John’  and  ‘Mary’ 
are  represented  as  patterns  of  activity  over  the  entire  network  such  that  each  node  in  the 
network  has  a  specific  value  in  the  patterns  for  ‘John’  and  “Mary’,  respectively,  then  how  can 
the  network  represent  ‘John’  and  ‘Mary’  at  the  same  time?  The  situation  gets  even  more 
complex  if  the  system  has  to  represent  relations  such  as  ‘John  loves  Mary’,  or  ‘John  loves 
Mary  but  Tom  loves  Susan’.  In  contrast  to  the  distributed  approach,  the  structured  approach 
holds  that  small  clusters  of  nodes  can  have  distinct  representational  status  (for  simplicity, 
structured  connectionist  models  often  equate  a  small  cluster  of  nodes  with  a  single  idealized 
node).  In  particular,  there  exist  small  clusters  of  nodes  that  act  as  ‘focal’  nodes  or  ‘handles' 
of  learned  concepts  and  provide  access  to  more  elaborate  node  structures  which  make  up  the 
detailed  encoding  of  concepts.  Such  a  detailed  encoding  might  include  various  features  of 
the  concept  as  well  as  its  relationship  to  other  concepts  (see  (Feldman,  1989)  and  the  article 
by  Shastri  in  (Barnden  and  Pollack,  1991)).  The  fully  distributed  view  is  also  inconsistent 
with  the  continually  emerging  data  about  the  localization  of  function  in  the  brain. 

The  structured  approach  is  often  wrongly  equated  with  the  so  called  ‘“grandmother  cell” 
approach  which  assumes  that  each  concept  is  represented  by  a  distinct  node.  This  mis¬ 
understanding  stems  from  an  incorrect  interpretation  of  the  representational  role  of  ‘focal’ 
nodes. 

second  difference  between  the  structured  and  the  fully  distributed  approaches  concerns 
learning.  The  latter  underplays  the  importance  of  structure  and  eissumes  that  essentialy  all 
the  required  structure  emerges  as  a  result  of  general-purpose  learning  processes  operating  on 
relatively  unstructured  hidden  layers.  The  structured  approach  holds  that  such  a  tabula-rasa 
view  is  untenable  on  grounds  of  computational  complexity;  training  unstructured  networks 
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using  general  purpose  learning  techniques  is  not  a  feasible  methodology  for  obtaining  scalable 
solutions  to  complex  problems.  The  structured  approach  emphasizes  the  importance  of  prior 
structure  for  effective  learning  and  requires  that  the  initial  design  of  network  models  —  for 
example,  the  broad  representational  significance  of  nodes,  the  number  of  representational 
levels  in  the  network,  and  the  network  interconnection  pattern  —  reflect  the  structure  of  the 
problem. 

2.  SOME  NEURAL  CONSTRAINTS  ON  COGNITIVE  MODELS 
2.1.  Representational  constraints 

With  over  10“  computing  elements  and  10*®  interconnections,  the  human  brain's  capacity 
for  encoding,  communicating,  and  processing  information  seems  awesome.  But  if  the  brain 
is  extremely  powerful  it  is  also  extremely  limited:  First,  neurons  are  slow  computing  devices. 
Second,  although  the  spatio-temporal  integration  of  inputs  performed  by  neurons  is  quite 
complex,  it  is  relatively  undifferentiated  with  reference  to  the  needs  of  symbolic  computation. 
Third,  neurons  communicate  via  relatively  simple  ‘messages’  that  can  encode  only  a  few  bits 
of  information.  Hence  a  neuron’s  output  cannot  be  expected  to  encode  names,  pointers,  or 
complex  structures. 

A  specific  limitation  of  neurally  plausible  systems  is  that  they  have  difficulty  representing 
composite  structures  in  a  dynamic  fashion  (also  see  COMPOSITION ALITY  IN  NEURAL  SYS¬ 
TEMS).  Consider  the  representation  of  the  fact  give(John,  Mary,  Bookl).  This  fact  cannot  be 
represented  dynamically  by  simply  activating  the  roles  giver,  recipient,  and  give-object,  and 
the  constituents  ‘John’,  ‘Mary’,  and  ‘BookT.  Such  a  representation  would  be  indistinguish¬ 
able  from  the  representation  of  give(Mary,  John,  Bookl).  The  problem  is  that  representing 
a  fax:t  requires  representing  the  appropriate  bindings  between  roles  and  their  fillers.  It  is 
easy  to  represent  static  (long-term)  bindings  using  dedicated  nodes  and  links.  For  example, 
one  could  posit  a  separate  ‘binder’  node  for  each  role-filler  pair  to  represent  role-filler  bind¬ 
ings.  Such  a  scheme  is  adequate  for  representing  long-term  knowledge  because  the  required 
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binder  nodes  may  be  recruited  over  time.  This  scheme  however,  is  implausible  for  represent¬ 
ing  dynamic  bindings  arising  during  language  understanding  and  visual  processing  since  it 
is  unlikely  that  there  exist  mechanisms  for  establishing  new  links  within  such  time  scales. 
The  alternative  that  interconnections  between  all  possible  pairs  of  roles  and  fillers  already 
exist  and  the  appropriate  ones  become  “active”  temporarily  to  represent  dynamic  bindings 
is  also  ruled  out  given  the  prohibitively  large  number  of  such  role-filler  bindings.  Techniques 
for  representing  bindings  based  on  the  von  Neumann  architecture  cannot  be  used  since  the 
storage  and  processing  capacity  of  nodes  and  the  resolution  of  their  outputs  is  insufficient 
to  store  and  communicate  names  or  pointers. 

2.2.  Scalability  in  time 

We  can  visually  recognize  items  from  a  potential  pool  of  100,000  commonplace  items  in  about 
a  hundred  milliseconds  and  can  understand  language  at  the  rate  of  several  words  per  second, 
even  though  doing  so  involves  perceptual  processing,  lexical  access,  parsing,  and  reasoning. 
This  indicates  that  we  can  perform  a  wide  range  of  visual,  linguistic,  and  inferential  tasks 
within  a  few  hundred  milliseconds.  This  observation  provides  a  powerful  constraint  that  can 
inform  our  search  for  cognitive  models  (Feldman  and  Ballard,  1982). 

2.3.  Scalability  in  space 

Although  the  number  of  neurons  in  the  brain  is  quite  large,  it  is  not  too  large  compared  to 
the  ‘size’  of  the  problems  it  must  solve!  Consider  vision  and  reasoning.  The  retinal  output 
consists  of  a  million  signals  and  similarly,  our  common  sense  knowledge  base  may  contain 
more  than  a  million  items.  This  suggests  that  any  model  of  vision  or  reasoning  whose  node 
requirement  grows  quadratically  or  higher  with  respect  to  the  size  of  the  problem,  may  not 
be  neurally  plausible. 
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2.4.  From  constraints  to  predictions 

While  cognitive  agents  solve  a  wide  range  of  tasks  with  extreme  efficiency  their  cognitive 
ability  is  also  limited  in  a  number  of  ways  —  examples  abound  in  vision,  language,  and 
short-term  memory.  It  is  expected  that  structured  connectionist  models  that  incorporate 
representational  and  scalability  constraints  discussed  above  would  help  in  understanding 
and  explaining  not  only  the  strengths  of  human  cognition  but  also  some  of  its  limitations. 

3.  SOME  STRUCTURED  CONNECTIONIST  MODELS 
3.1.  Early  work 

One  of  the  earliest  examples  of  a  structured  connectionist  model  was  the  interactive  activa¬ 
tion  model  for  letter  perception  by  McClelland  and  Rumelhart  (1981).  The  model  consisted 
of  three  layers  of  nodes  corresponding  to  visual  letter  features,  letters,  and  words.  Nodes 
representing  mutually  exclusive  hypotheses  within  the  letter  and  word  layers  inhibited  each 
other.  For  example,  since  only  one  letter  may  exist  in  a  given  letter  position,  all  nodes  rep¬ 
resenting  letters  in  the  same  position  inhibited  each  other.  A  node  in  the  feature  layer  weis 
connected  via  excitatory  connections  to  nodes  in  the  letter  layer  representing  letters  that 
contained  that  feature.  Similarly,  a  node  in  the  letter  layer  was  connected  via  excitatory 
connections  to  nodes  in  the  word  layer  representing  words  that  contained  that  letter  in  the 
appropriate  position.  Additionally,  there  were  reciprocal  connections  from  the  word  layer 
to  the  letter  layer.  The  interconnection  pattern  allowed  bottom-up  perceptual  processing  to 
be  guided  by  top-down  expectations.  The  model  could  explain  a  number  of  psychological 
findings  about  the  preference  of  words  and  pronounceable  non-words  over  other  non-words 
and  isolated  letters. 

Other  examples  of  early  structured  connectionist  models  were  word  sense  disambigua¬ 
tion  models  developed  by  Cottrell  and  Small  (1983)  and  Waltz  and  Pollack  (1985).  Most 
words  have  multiple  senses  but  we  are  able  to  exploit  contextual  and  syntactic  informa- 
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tion  to  rapidly  disambiguate  the  meanings  of  words.  These  models  demonstrated  how  such 
disambiguation  might  occur.  Cottrell  and  Small’s  model  consisted  of  a  three-level  network 
consisting  of  the  lexical  (word)  level,  the  word-sense  level,  and  the  c«ise-level.  There  were 
inhibitory  links  between  different  noun  senses  of  the  same  word,  and  between  different  pred¬ 
icate  senses  of  the  same  word.  A  node  at  the  lexical  level  was  connected  to  all  its  senses  at 
the  word-sense  level.  Connections  between  the  word-sense  level  and  the  case-level  expressed 
all  feasible  bindings  between  predicates  and  objects.  As  a  sentence  was  input  by  activating 
the  appropriate  lexical  items  in  a  sequence,  activation  flowed  through  the  network  and  the 
combination  of  lexical  items,  word  senses  and  case  assignment  that  best  fit  the  input  formed 
a  stable  coalition  of  active  nodes. 

Another  example  of  a  structured  connectionist  model  was  the  connectionist  semantic  net¬ 
work  model,  eSN  '‘lastri,  1988).  CSN  viewed  memory  as  a  collection  of  concepts  organized 
in  a  IS-A  hierarchy  (e.g.,  “Bird  IS-A  Animal’)  and  allowed  the  attachment  of  property  val¬ 
ues  to  concepts.  Unlike  a  traditional  semantic  network,  a  property- value  attachment  in  CSN 
consisted  of  distributional  information  indicating  how  members  of  a  concept  were  distributed 
with  respect  to  the  different  values  of  the  property.  CSN  could  answer  (i)  inheritance  queries, 
i.e..  infer  the  most  likely  value  of  a  specified  property  for  a  given  concept  and  (ii)  recognition 
queries,  i.e.,  given  a  description  consisting  of  property- value  pairs,  find  the  concept  that  best 
matched  the  given  description.  CSN  found  answers  to  queries  by  combining  information  en¬ 
coded  in  the  network  in  accordance  with  an  evidential  formalization  based  on  the  principle  of 
maximum  entropy.  In  particular,  CSN  could  use  distributional  information  to  deal  with  with 
exceptional  and  conflicting  information  in  a  principled  manner  and  disambiguate  ‘multiple 
inheritance’  situations  that  could  not  be  dealt  with  by  extant  formulations  of  inheritance  in 
AI. 

CSN  encoded  concepts,  properties  and  values  using  ‘focal’  nodes.  The  IS-A  relations  were 
encoded  eis  links,  and  property  values  were  attached  to  concepts  by  connecting  the  appro- 
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priate  property,  value  and  concept  nodes  via  binder  nodes.  The  weights  on  links  between 
concept,  value,  and  binder  nodes  captured  the  distributional  information  associated  with  a 
property  value  attachment.  A  query  was  posed  by  activating  appropriate  nodes.  Thereafter, 
CSN  performed  the  required  inferences  automatically  by  propagating  graded  activations  and 
combining  activations  using  appropriate  activation  combination  rules. 

3.2.  Recent  models  of  memory  and  reasoning 

The  models  above  made  significant  contributions  but  were  limited  in  their  expressive  power 
and  inferential  ability.  One  of  their  key  limitations  was  that  they  did  not  address  the 
dynamic  binding  problem.  For  example,  the  McClelland  and  Rumelhart  model  required 
n-fold  repetition  of  letter  and  feature  layers  to  deal  with  words  of  length  n;  it  could  not 
dynamically  bind  a  letter  to  a  position  in  a  word.  The  Cottrell  and  Small  and  Waltz  and 
Pollack  systems  pre-wired  all  possible  bindings  using  dedicated  nodes  and  links.  Recently 
there  has  been  significant  progress  in  solving  this  problem.  These  include  the  CONPOSIT 
system  (see  article  by  Barnden  and  Srinivas  in  (Barnden  and  Pollack,  1991))  the  ROBIN 
system  (Lange  and  Dyer,  1989),  the  SHRUTI  system  (Shastri  and  Ajjanagadde.  1993),  and 
the  CONSVDERR  system  (Sun,  1992).  We  give  a  brief  overview  of  SHRUTI  which  shares  a 
number  of  representational  and  functional  features  with  ROBIN  but  differs  from  it  in  the 
mechanism  used  for  representing  dynamic  bindings. 

SHRUTI  can  encode  a  large  number  of  specific  facts,  general  rules,  as  well  as  IS-A  relations 
between  concepts  and  perform  a  broad  class  of  reasoning  with  extreme  efficiency.  SHRUTI 
encodes  an  n-ary  predicate  as  a  cluster  of  nodes  which  includes  n  role  nodes  (refer  to  Figure 
1).  Nodes  such  as  John  and  Mary  correspond  to  focal  nodes  of  the  complete  representations 
of  the  individuals  ‘John’  and  ‘Mary’.  A  rule  is  encoded  by  linking  the  roles  of  the  antecedent 
and  consequent  predicates  in  accordance  with  the  correspondence  between  roles  specified  in 
the  rule.  SHRUTI  represents  dynamic  bindings  using  synchronous  firing  of  the  appropriate 
argument  and  concept  nodes.  For  example,  the  dynamic  fact  give( John, Mary, Bookl)  is 
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Figure  1:  Encoding  of  predicates,  concepts  and  the  rules  'ix,y,z  [  give(i.y,z')  =>  ou'n(y.z)  ], 
'ix,y  [  buy(x,y)  =>  own(x,y)  ],  and  Vi,j/  [  own(x,y)  =>•  can-sell(x,y)  ]. 

represented  by  a  rhythmic  pattern  of  activity  wherein  the  focal  nodes  John,  Mary,  and 
Bookl  are  firing  in  synchrony  with  the  role  nodes  giver,  recip,  and  g-obj  respectively.  By 
virtue  of  the  interconnections  between  role  nodes  of  the  predicates  give,  own,  and  can-sell, 
this  state  of  activation  evolves  so  that  (i)  owner,  and  in  turn,  p-seller  start  firing  in  synchrony 
with  recip  and  hence.  Mary  and  (ii)  o-oftjand,  in  turn,  cs-obj  starts  firing  in  synchrony  with 
g-obj  and  hence.  Bookl.  The  resulting  firing  pattern  corresponds  to  the  dynamic  facts 
give(John, Mary, Bookl),  own( Mary, Bookl),  and  can-sell(Mary, Bookl).  The  key  assumption 
here  is  that  if  nodes  A  and  B  are  linked,  the  firing  of  A  leads  to  a  synchronous  firing  of  B.  For 
more  on  the  role  of  synchrony  and  the  dynamic  binding  problem  see  CO.M POSITIONALITY  IN 
NEURAL  SYSTEMS. 

SHRUTI  can  encode  long-term  facts  and  a  bounded  number  of  instantiations  of  each  pred¬ 
icate  and  concept.  The  latter  allows  it  to  deal  with  reasoning  involving  'bounded  recursion’. 
SHRUTI  can  also  represent  a  type  (IS-A)  hierarchy  and  allows  categories  as  well  as  instances 
in  rules,  facts,  and  queries.  By  using  appropriate  weights  on  links,  the  system  can  also  encode 
soft /evidential  rules.  The  time  SHRUTI  takes  to  generate  a  chain  of  inference  is  independent 
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of  the  total  number  of  rules  2ind  facts  and  is  equal  to  I  *  a,  where  /  is  the  number  of  steps 
in  the  chain  of  inference  and  a  is  the  time  required  for  connected  nodes  to  synchronize.  If 
we  assume  a  to  be  about  100  milliseconds,  SHRUTI  demonstrates  that  a  system  of  simple 
computing  elements  can  encode  millions  of  items  and  draw  interesting  inferences  in  a  few 
hundred  milliseconds.  An  implementation  of  the  system  on  a  CM-5  encodes  over  300,000 
items  and  responds  to  queries  with  derivation  lengths  of  up  to  8  in  under  a  second. 

Instead  of  using  synchronous  firing  of  nodes  to  represent  and  propagate  bindings,  ROBIN 
and  CONSYDERR  assign  a  distinct  “signature”  to  each  concept  and  propagate  these  codes  to 
establish  bindings.  A  signature  may  take  the  form  of  a  unique  activation  value  or  a  pattern 
of  activity.  CONPOSIT  creates  bindings  by  virtue  of  the  relative  position  of  active  nodes  and 
the  similarity  of  patterns.  The  use  of  temporal  synchrony  in  SHRUTI  leads  to  a  number  of 
predictions  about  the  capacity  of  the  working  memory  underlying  rapid  reasoning  (WMRR). 
Thus  SHRUTI  predicts  that  a  very  large  number  of  facts  may  be  co-active  in  WMRR  and  a 
large  number  of  rules  may  fire  simultaneously  as  long  as:  the  maximum  number  of  distinct 
entities  that  can  occur  as  role-fillers  in  the  dynamic  facts  is  small  (at  most  10),  and  only 
a  small  number  of  instances  of  each  predicate  (%  3)  may  be  co-active  at  the  same  time. 
The  temporal  approach  also  predicts  that  the  depth  to  which  an  agent  may  reason  rapidly 
but  systematically  is  bounded.  All  these  constraints  are  motivated  by  biological  considera¬ 
tions.  For  example,  each  entity  participating  in  dynamic  bindings  occupies  a  distinct  phase 
and  hence,  the  number  of  distinct  entities  that  can  occur  as  role-fillers  in  dynamic  facts 
cannot  exceed  where  TTmai  is  the  maximum  delay  between  consecutive  firings  of 

synchronous  cell-clusters,  and  u  equals  the  allowable  jitter  in  the  firing  times  of  synchronous 
cell-clusters. 

Henderson  (1994)  has  developed  an  on-line  parser  for  English  using  a  SHRUTI-like  archi¬ 
tecture.  The  parser’s  speed  is  independent  of  the  size  of  the  grammar  and  it  can  recover 
the  structure  of  arbitrary  long  sentences  as  long  as  the  dynamic  state  required  to  parse  the 
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sentence  does  not  exceed  the  capacity  of  the  parser’s  working  memory.  The  parser  shows 
that  the  constraints  on  the  working  memory  help  explain  several  properties  of  human  parsing 
involving  long  distance  dependencies,  garden  path  effects  and  our  limited  ability  to  deal  with 
center-embedding. 

3.3.  Significance  of  structure 

The  representational  and  inferential  power  of  structured  connectionist  systems  such  as  SHRUTI 
and  their  ability  to  draw  inferences  in  parallel  is  directly  attributable  to  their  use  of  structured 
representations.  Any  system  that  uses  fully  distributed  representations  will  be  incapable  of 
representing  multiple  dynamic  facts  and  applying  multiple  rules  simultaneously.  Attempts 
to  develop  distributed  systems  to  handle  relations  invariably  eiv  i  up  positing  several  distinct 
banks  —  one  for  each  role  —  thereby  stepping  away  from  a  fully  distributed  mode,  or  fall 
back  on  seriality.  It  is  not  surprising  that  distributed  systems  such  as  DCPS  (Touretzky  and 
Hinton,  1988)  have  extremely  limited  capacity  for  encoding  dynamic  structures  and  are  serial 
at  the  level  of  rule-application. 

3.4.  Learning  in  structured  networks 

The  models  discussed  thus  far  did  not  address  the  issue  of  learning  in  detail  (though  an 
outline  of  how  rule- learning  might  occur  in  a  SHRUTI-like  system  appears  in  (Sheistri  and 
.\jjanagadde,  1993)).  Regier’s  (1992)  model  for  learning  the  lexical  semantics  of  natural 
language  spatial  terms  provides  a  concrete  example  of  learning  within  the  structured  con¬ 
nectionist  paradigm.  The  model  observes  movies  of  simple  2-dimensional  objects  moving 
relative  to  one  another  —  where  each  movie  is  labeled  cis  an  example  of  some  spatial  term 
from  a  natural  language  —  and  learns  the  association  between  the  label  (word)  and  the 
event/relation  they  describe.  The  model  succesfully  learned  several  spatial  terms  for  diverse 
natural  languages.  The  model  includes  structured  network  components  that  reflect  prior 
constraints  about  the  task  as  well  as  the  usual  “hidden  layers”,  and  demonstrates  how  struc- 
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tured  connectionist  networks  caji  incorporate  flexible  learning  ability  and  at  the  same  time, 
leverage  prior  structure  to  achieve  tractability. 

In  addition  to  incremental  learning  driven  by  repeated  exposure  to  a  large  body  of  training 
data,  structured  models  have  also  made  use  of  one-shot  learning  using  recruitment  learning 
schemes  (e.g.,  see  (Shastri,  1988)  and  article  by  Diederich  in  (Barnden  and  Pollack,  1991)). 

4.  DISCUSSION 

Structured  connectionism  offers  a  rich  framework  for  developing  models  of  cognition  that 
are  guided  by  biological,  behavioral,  and  computational  constraints.  The  approach  has  been 
productive  and  resulted  in  a  number  of  models  that  are  informed  by  insights  from  diverse 
disciplies  such  as  computer  science,  AI,  psychology,  linguistics  and  neuroscience.  Having 
resolved  some  difficult  representational  problems,  the  focus  of  work  is  shifting  toward  the 
study  of  structured  adaptive  networks  that  are  grounded  in  perception  and  action. 
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FIGURE  CAPTIONS 


Figure  1.  Encoding  of  predicates,  concepts  and  the  rules  'ix,y,z  [  gtve(x,y,z)  =>  own(y,z)  ], 
Vi,j/  [  huy(x,y)  ^  own(x,y)  ],  and  Vi,y  [  own(x,y)  =>  can-sell(x,y)  ]. 
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Introduction:  The  dynamic  binding  problem  is  a  central  problem  in  neural  information  processing. 
During  visual  processing  it  takes  the  familiar  form  of  the  segmentation  problem  and  the  feature  binding 
problem  —  visual  processing  requires  the  rapid  grouping  of  information  over  the  spatial  extent  of  an  object 
and  across  different  feature  maps  so  that  features  belonging  to  one  object  are  grouped  together  and  not 
confused  with  those  belonging  to  another.  The  dynamic  binding  problem  however,  is  not  restricted  to 
vision  and  arises  in  any  cognitive  activity  that  requires  the  rapid  instantiation  and  integration  of  structured 
representations.  In  particular,  it  arises  in  myriad  ways  during  language  understanding  where  dynamic 
bindings  are  required  to  support  among  other  things,  reasoning,  syntactic  processing,  and  the  integration  of 
syntactic  and  semantic  structures. 

A  promising  solution  to  the  dynamic  binding  problem  based  on  temporal  synchrony  has  emerged  over 
the  past  few  years.  The  possibility  of  using  synchronous  activity  of  appropriate  cells  to  encode  bindings  was 
suggested  several  years  ago  by  Mdsburg  (1981)  but  the  idea  has  found  its  full  expression  more  recently  with 
the  development  of  a  range  of  models  that  use  temporal  synchrony  to  carry  out  segmentation  (Malsburg 
&  Buhmann  1992),  object  recognition  (Hummel  &  Biederman  1992),  extraction  of  attention  maps  (Nieber, 
Kock,  &  Rosin  1993),  rapid  reasoning  (Shastri  &  Ajjanagadde  1993)  and  parsing  of  english  (Henderson 
1994).  Emerging  data  from  neurophysiology  also  seems  to  suggest  that  synchronous  activity  may  play  a  role 
in  the  representation  of  dynamic  bindings  in  the  animal  brain  (e.g.,  Eckhom  et  al.  1988;  Gray  &  Singer 
1989;  Kreiter  &  Singer  1992). 

The  assumption  that  dynamic  bindings  are  encoded  using  synchronous  activity  has  importemt  represen¬ 
tational  and  processing  implications.  We  examine  these  implications  in  the  context  of  rapid  reasoning  and 
identify  several  specific  constraints  on  the  processing  of  structured  information  that  are  suggested  by  the 
use  ot  synchrony.  Work  on  a  model  of  rapid  reasoning  and  the  parsing  of  english  sentences  suggests  that 
interesting  cognitive  tasks  can  be  solved  within  these  constraints.  This  suggests  that  synchrony  may  be  a 
sufficiently  powerful  mechanism  for  representing  dynamic  bindings. 

Reasoning  and  the  dynamic  binding  problem:  Although  the  dynamic  binding  problem  un¬ 
derlies  most  cognitive  tasks,  the  nature  of  the  representational  structures  instantiated  by  dynamic  bindings 
varies  from  one  task  to  mother.  Consider  the  segmentation  problem  where  parts  of  the  input-field  have  to 
be  attributed  to  distinct  objects.  In  this  task,  dynamic  bindings  may  be  best  viewed  as  instantiating  sets  — 
or  unary  relations.  All  elements  in  the  input-field  that  belong  to  the  same  object  are  grouped  together  to 
form  a  set.  Let  us  examine  the  sort  of  representational  structures  instantiated  by  dynamic  bindings  during 
rapid  (reflexive)  reasoning  underlying  language  understanding.* 

Assume  that  an  agent’s  long-term  memory  (LTM)  embodies  the  following  systematic  knowledge:  ‘If 
someone  gives  a  recipient  an  object  then  the  recipient  comes  to  own  that  object’.  Given  the  above  knowledge 
an  agent  would  be  capable  of  inferring  ‘Mary  owns  a  book’  on  being  told  ‘John  gave  Mar>’  a  book’.  A  neurally 
plausible  reasoning  system  should  therefore  exhibit  the  following  behavior:  If  the  network’s  pattern  of  activity 

‘The  preparation  of  this  paper  was  supported  by  ONR  grant  N00014-93-1-1149. 

^Empirical  data  strongly  suggests  that  inferences  required  to  establish  referential  and  causal  coherence  occur  rapidly  and 
automatically  during  text  understanding  (see  e.g..  Carpenter  &  Just  1977).  Thus  certain  kinds  of  inferences  can  be  drawn 
very  rapidly  —  within  a  few  hundred  milliseconds.  The  speed  and  spontaneity  with  which  we  understand  language  highlights 
our  ability  to  perform  such  inferences  without  conscious  effort  —  as  though  they  were  a  reflexive  response  of  our  cognitive 
apparatus.  In  view  of  this  we  have  described  such  reasoning  as  reflexive. 
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is  initialized  to  represent  ‘John  gave  Mary  a  book’  then  very  soon,  its  activity  should  evolve  to  include  the 
representation  of  ‘Mary  owns  a  book’. 

A  network  must  solve  several  technical  problems  in  order  to  incorporate  the  above  behavior.  Before 
discussing  these  problems  let  us  introduce  some  notation.  A  specific  event  such  as  ‘John  gave  Mary  a  book’ 
can  be  viewed  as  an  instance  of  the  three  place  relation  give  with  roles:  giver,  recipient,  and  give-object  and 
expressed  as  the  fact  give(John,  Mary,  a-book).  The  systematic  rule-like  knowledge  given  above  about  giving 
and  owning  may  be  succinctly  expressed  as;  (1)  give(x,y,z)  =>  ovm(y,z),  wherein  ovm  is  a  two  place  relation 
with  roles:  owner  and  own-object  and  *=»’  informally  means  ‘leads  to’. 

Dynamic  representation  of  facts  requires  dynamic  bindings:  The  reasoning  system  must  be  capable 
of  rapidly  representing  facts  such  as  give(John,  Mary,  a-book)  as  and  when  they  are  “communicated”  to  the 
system  by  other  perceptual  or  linguistic  processes  and  as  they  juise  internally  as  a  result  of  the  reasoning 
process.  Note  that  the  fact  give(John,  Mary,  a-book)  cannot  be  represented  by  simply  activating  the  rep¬ 
resentations  of  the  roles  giver,  recipient,  and  give-object,  and  the  constituents  ‘John’,  ‘Mary’,  and  ‘a-book’, 
since  such  a  representation  would  be  indistinguishable  from  that  of  give(a-book,  Mary,  John).  This  fact,  like 
any  other  instantiation  of  an  n-ary  relation,  is  a  composite  structure  wherein  each  constituent  fills  a  distinct 
role  in  a  relation.  Consequently  the  representation  of  such  as  fact  requires  the  representation  of  appropriate 
bindings  between  the  roles  of  the  relation  and  its  fillers.  Thus  the  dynamic  representation  of  give(John, 
Mary,  a-book)  requires  the  creation  of  dynamic  bindings  (givers John,  recipient=Mary,  give-object=a-book). 

The  multiple-instantiation  problem:  Reasoning  often  requires  the  simultaneous  activation  of  more  thaui 
one  fact  pertaining  to  the  same  relation.  For  example,  the  system  may  have  to  encode  give(John,  Mary, 
a-book)  and  give(Mary,  Tom,a-car)  at  the  same  time.  A  reasoning  system  must  be  capable  of  keeping  multiple 
instantiations  of  the  same  relation  active  without  cross-talk  between  instantiations. 

Reasoning  involves  systematic  propagation  of  dynamic  bindings:  Another  problem  concerns  the  dy¬ 
namic  generation  of  inferred  facts.  For  example,  starting  with  a  dynamic  representation  of  givefJohn,  Mary, 
a-book)  the  state  of  a  network  encoding  rule  (1)  should  evolve  rapidly  to  include  the  dynamic  representation 
of  the  inferred  fact:  own(Mary,  a-book).  Generating  inferred  facts  involves  the  systematic  propagation  of  dy¬ 
namic  bindings  in  accordance  with  vjirious  rules  embodied  in  the  system.  The  rule  give{x,y,z)  =>  own[y,  z) 
specifies  that  a  give  event  leads  to  an  own  event  wherein  the  recipient  of  a  give  event  corresponds  to  the 
owner  of  an  own  event  and  the  give-object  of  a  give  event  corresponds  to  the  own-object  of  an  own  event. 
Thus  the  application  of  this  rule  in  conjunction  with  the  instance  give(John,  Mary,  a-book)  should  create  an 
instance  of  own  with  bindings  (oumer=Mary,  ovm-object=a-book). 

Long-term  facts  as  temporal-pattern  matchers:  In  addition  to  encoding  domain  rules,  the  reasoning 
system  must  also  be  capable  of  encoding  facts  in  its  LTM  and  using  them  during  recall,  recognition,  query- 
answering,  and  reasoning.  For  example,  a  reasoning  system  should  be  capable  of  encoding  the  fact  ‘John 
bought  a  Rolls-Royce’  in  its  LTM  and  using  it  to  rapidly  answer  queries  such  as  ‘Did  John  buy  a  Rolls- 
Royce?’  and  ‘Does  John  own  a  car?’  Observe  that  a  fact  in  LTM  should  store  the  associated  bindings 
as  static  bindings  within  a  long-term  structure  which  can  interact  with  dyntunic  bindings  and  detect  the 
occurrence  of  dynamic  bindings  that  match  the  stored  static  bindings. 

Parsing  and  the  dynamic  binding  problem:  In  addition  to  reasoning,  language  understanding  requires 
a  solution  to  the  dynsunic  binding  problem  for  syntactic  processing  and  the  dyntunic  linking  of  syntactic  tmd 
semantic  structures.  In  particular,  parsing  requires  the  extraction  of  constituents  in  a  sequence  of  words 
and  determining  the  appropriate  place  of  these  constituents  in  the  overall  phrase  structure  of  the  sentence. 
Prom  the  point  of  view  of  the  representation  and  processing  of  dynamic  bindings  one  cam  draw  the  following 
analogy  between  parsing  and  reasoning:  non-terminals  of  the  underlying  grammar  correspond  to  ‘entities’  in 
the  reasoning  system,  structural  relations  among  non- terminals  such  as  ‘dominates’  and  ‘precedes’  correspond 
to  ‘relations’,  grammatical  constraints  and  phrase  structure  combination  operators  correspond  to  ‘rules’,  and 
the  representation  of  the  phrase  structure  during  the  parsing  process  corresponds  to  the  collection  of  ‘dynamic 
facts’  active  in  the  network’s  state  of  activation. 

Overview  of  a  model  of  reflexive  reasoning:  We  have  developed  —  shruti  —  a  computational 
model  of  reflexive  reasoning  which  incorporates  solutions  to  the  problems  discussed  above,  shruti  can 
encode  a  large  number  of  specific  facts  and  general  rules  involving  n-ary  relations  as  well  as  sub/super 
ordinate  relations  between  concepts  and  perform  a  broad  class  of  reasoning  with  extreme  efficiency.  It 
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solves  the  dynamic  binding  problem  by  maintsdning  and  propagating  dynamic  bindings  using  synchronous 
firing  of  appropriate  nodes.  The  dynamic  fact  ‘John  gave  Mary  a  book’  is  represented  in  shruti  by  the 
clusters  for  ‘John’  and  ‘giver’  firing  in  synchrony;  the  clusters  for  ‘Mary’  and  ‘recipient’  firing  in  synchrony; 
and  the  clusters  for  an  instance  of  ‘book’  and  ‘given-object’  firing  in  synchrony.  The  view  of  information 
processing  implied  by  shruti  is  one  where  i)  reasoning  is  the  transient  but  systematic  propagation  of  a 
rhythmic  pattern  of  activity,  ii)  each  entity  in  the  dynamic  memory  is  a  phase  in  the  above  rhythmic 
activity,  iii)  dynamic  bindings  are  represented  as  the  synchronous  firing  of  appropriate  nodes,  iv)  rules  are 
interconnection  patterns  that  cause  the  propagation  and  transformation  of  rhythmic  patterns  of  activity,  and 
v)  long-term  facts  are  subnetworks  that  act  as  temporal  pattern  matchers  and  become  active  when  certain 
cell-clusters  fire  synchronously. 

The  details  of  the  model  may  be  found  in  (Shastri  &  Ajjetnagadde  1993).  In  brief,  we  posit  that  each 
n-place  relation  be  encoded  as  a  bank  of  nodes  containing  n  role  nodes,  a  collector  node  and  an  enabler 
node.  Here  ‘node’  refers  to  an  idealized  computational  device  which  corresponds  to  a  small  cluster  of  cells. 
A  rule  is  encoded  by  linking  the  roles  of  the  antecedent  and  consequent  relations  in  accordance  with  the 
correspondence  between  roles  specified  in  the  rule.  For  example,  the  rule  give(x,y,z)  =>  own(y,z)  is  encoded 
by  connecting  the  roles  recipient  and  give-obj  of  the  relation  give  to  the  roles  owner  and  own-obj  of  the 
relation  own,  respectively.  By  virtue  of  the  interconnections  between  role  nodes  of  give  and  own,  the  state  of 
activation  resulting  from  the  pattern  of  activation  for  give(John,  Mary,  a-book)  leads  to  a  activation  pattern 
wherein  the  roles  owner  and  own-object  start  firing  in  synchrony  with  recipient  and  give-object  respectively, 
and  hence,  with  Mary  and  a-book  respectively.  Thus  starting  with  a  pattern  containing  the  dynamic  fact 
give( John, Mary, a-book),  the  network  state  evolves  such  that  the  pattern  of  activation  includes  the  dynamic 
fact  own( Mary, a-book).  The  key  assumption  here  is  that  if  there  is  a  link  from  node  A  to  node  B,  the 
firing  of  A  will  lead  to  a  synchronous  firing  of  B.  Note  that  the  time  taken  to  generate  a  chain  of  inference 
is  independent  of  the  total  number  of  rules  and  facts  and  is  just  equal  to  I  *  a  where  I  equals  the  length 
of  the  chain  of  inference,  a  equals  the  time  required  for  connected  nodes  to  synchronize.  If  we  assume  a 
plausible  value  of  a  (under  100  milliseconds),  SHRUTI  demonstrates  that  it  is  possible  for  a  system  of  simple 
computing  elements  to  encode  millions  of  rules  and  facts  and  draw  interesting  multiple-step  inferences  within 
a  few  hundred  milliseconds. 

SHRUTI  can  also  encode  long-term  facts  and  a  type  hierarchy.  The  latter  allows  reference  to  categories  as 
well  as  instances  in  rules,  facts,  and  queries  and  the  encoding  of  context  sensitive  rules  such  as:  (walk-into(x,y) 
=»  hurt(x)\  but  only  in  the  context  where  the  fillers  of  the  two  roles  of  walk-into  have  the  feature  solid).  In 
general,  the  expressive  power  of  SHRUTI  is  sufficient  to  encode  knowledge  structures  such  as  schemas,  frames, 
productions,  and  if-then  rules. 

Constraints  on  reflexive  processing  predicted  by  shruti:  We  describe  some  of  the  repre¬ 
sentational  and  processing  constraints  and  predictions  that  follow  from  our  attempt  at  engineering  a  reflexive 
reasoning  system  based  on  temporal  synchrony.  The  constraints  and  predictions  relate  to  (i)  the  capacity 
of  the  ‘working  memory’  underlying  reflexive  processing,  (ii)  bounds  on  the  depth  of  reasoning  and  differ¬ 
ences  in  the  time  course  of  associative  priming  versus  systematic  reasoning,  (iii)  the  form  of  rules  that  may 
participate  in  reflexive  processing,  and  (iv)  the  need  for  a  large  capacity  memory  capable  of  storing  relation 
instances  in  around  1  second. 

Working  memory  underlying  reflexive  processing:  Dynaunic  bindings,  and  hence,  dynamic  (active) 
faicts  are  represented  in  shruti  as  a  rhythmic  pattern  of  activity  over  nodes  in  the  LTM  network.  In  functional 
terms,  this  transient  state  of  activation  holds  information  temporarily  during  an  episode  of  reflexive  reasoning 
and  corresponds  to  the  working  memory  underlying  reflexive  reasoning  (WMRR).  Note  that  WMRR  is  just 
the  state  of  activity  of  the  LTM  network  and  not  a  separate  buffer.  Also  note  that  the  dynamic  facts 
represented  in  the  WMRR  during  an  episode  of  reflexive  reasoning  should  not  be  confused  with  the  small 
number  of  short-term  facts  an  agent  may  overtly  keep  track  of  during  reflective  processing  and  problem 
solving.  WMRR  should  not  be  confused  with  the  short-term  memory  implicated  in  various  memory  span 
tasks  (Baddeley  1986).  In  our  view,  in  addition  to  the  overt  working  memory,  there  exist  as  many  “working 
memories”  as  their  cire  major  processes  in  the  brain. 

Our  work  predicts  that  the  capacity  of  WMRR  is  very  large  but  at  the  saime  time  it  is  constrained  in 
critical  ways.  Thus  the  number  of  dynamic  facts  that  can  potentially  be  present  in  the  working  memory  at 
any  given  time  can  be  as  high  as  k2  *  R  where  k2  is  the  multiple  instantiation  constant  (see  below)  and  R 
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is  the  number  of  relations  known  to  the  agent. 

Bound  on  the  number  of  distinct  entities  referenced  in  WMRR  During  an  episode  of  reflexive 
reasoning,  each  entity  involved  in  dynamic  bind'mgs  occupies  a  distinct  phase  in  the  rhythmic  pattern  of 
activity.  Hence  the  number  of  distinct  entities  that  can  occur  as  role-fillers  in  the  dynamic  facts  represented 
in  the  working  memory  cannot  exceed  nmazl^  where  nmax  is  the  maximum  delay  between  two  consecutive 
firings  of  cell-clusters  involved  in  synchronous  firing  amd  u  equals  the  width  of  the  window  of  synchrony 
—  i.e.,  the  maximum  allowable  lead/lag  between  the  firing  of  synchronous  cell-clusters.  If  we  assume  that 
neurally  plausible  value  of  Tr^oi  is  about  30  ms.  and  a  conservative  estimate  of  u  is  auound  6  ms.  we  aue 
lead  to  the  following  prediction:  As  long  as  the  number  of  distinct  entities  referenced  by  the  dynamic  facts  in 
the  working  memory  is  five  or  less,  there  will  essentially  be  no  cross-talk  among  the  dynamic  facts.  If  more 
entities  occur  as  role-fillers  in  dynamic  facts,  the  window  of  synchrony  w  would  have  to  shrink  appropriately 
in  order  to  accommodate  all  the  entities.  As  lj  shrinks,  the  possibility  of  cross-talk  between  dynamic  bindings 
would  increase  until  eventually,  the  cross-talk  would  become  excessive  and  disrupt  the  system’s  ability  to 
perform  systematic  reasoning.  The  exact  bound  on  the  number  of  distinct  entities  that  may  fill  roles  in 
dynamic  facts  would  depend  largest  and  smallest  feasible  values  of  jr^oi  and  w  respectively.  However  we  can 
safely  predict  that  the  upper  bound  on  the  maximum  number  of  entities  participating  in  dynamic  bindings 
can  be  no  more  than  10  (perhaps  less). 

Bound  on  the  multiple  instantiation  of  relations:  The  capacity  of  WMRR  is  also  limited  by  the 
constraint  that  each  relation  can  only  be  instantiated  a  bounded  number  of  times  (k2)  during  an  episode 
of  reasoning.  In  other  words  the  working  memory  can  contain  at  most  k2  dynamic  facts  per  relation  (we 
refer  to  K2  as  the  multiple  instantiation  constant).  Note  that  the  value  of  k2  need  not  be  the  same  for  all 
relations;  some  critical  relations  may  have  a  higher  value  of  k2  while  some  other  relations  may  have  a  smaller 
value.  The  cost  of  maintaining  multiple  instantiations  turns  out  to  be  significant  in  terms  of  space  and  time. 
For  example,  the  number  of  nodes  required  to  encode  a  rule  for  backwtird  reasoning  is  proportioned  to  the 
square  of  k2.  Thus  a  system  that  can  represent  three  dynamic  instantiations  of  each  relation  may  have  up  to 
nine  times  as  many  nodes  as  a  system  that  can  only  represent  one  instantiation  per  relation.  Furthermore, 
the  worst  case  time  required  for  propagating  multiple  instantiations  of  a  relation  also  increases  by  a  factor 
of  k2  .  In  view  of  the  eidditional  space  and  time  costs  associated  with  multiple  instantiation,  and  given  the 
necessity  of  keeping  these  resources  within  bounds  in  the  context  of  reflexive  processing,  we  predict  that  the 
value  of  k2  is  quite  smadl,  perhaps  no  more  than  3. 

Bound  on  the  depth  of  the  chain  of  reasoning:  Consider  the  propagation  of  synchronous  activity  along 
a  chain  of  role  ensembles  during  an  episode  of  reflexive  reasoning.  Two  things  might  happen  as  ewrtivity 
propagates  along  the  chain  of  role  ensembles.  First,  the  lag  in  the  firing  times  of  successive  ensembles  may 
gradually  build  up  due  to  the  propagation  delay  introduced  at  each  level  in  the  chain.  Second,  the  dispersion 
within  each  ensemble  may  gradually  increase  due  to  the  variations  in  the  propagation  delay  of  links  and  the 
noise  inherent  in  synaptic  and  neuronal  processes.  While  the  increased  lag  eilong  successive  ensembles  will 
lead  to  a  ‘phase  shift’,  and  hence,  binding  confusions,  the  increased  dispersion  of  activity  within  successive 
ensembles  will  lead  to  a  gradual  loss  of  binding  information.  Increased  dispersion  would  mean  less  phase 
specificity,  Jind  hence,  more  uncertainty  about  the  role’s  filler.  Due  to  the  increase  in  dispersion  along  the 
chain  of  reasoning,  the  propagation  of  activity  will  correspond  less  and  less  to  a  propagation  of  role  bindings 
and  more  and  more  to  an  associative  spread  of  activation.  For  example,  the  propagation  of  activity  along  a 
chain  of  rules  such  as:  Pl(x,y,z)  =>  P2(x,y,z)  =>  .  .  .  Pn(x,y,z)  due  to  a  dynamic  fact  Pl(a,b,c)  may  lead  to 
a  state  of  activation  where  all  one  can  say  about  Pn  is  this:  there  is  an  instance  of  Pn  which  involves  the 
entities  a,  b,  and  c,  but  it  is  not  clear  which  entity  fills  which  role  of  Pn.  In  view  of  the  above,  it  follows  that 
the  depth  to  which  an  agent  may  reason  during  reflexive  reasoning  is  bounded.  In  other  words,  an  agent  may 
be  unable  to  make  a  prediction  (or  answer  a  query)  —  even  when  the  prediction  (or  answer)  logically  follows 
from  the  knowledge  encoded  in  the  LTM  —  if  the  length  of  the  derivation  leading  to  the  prediction  (or  the 
answer)  exceeds  this  bound.  It  should  be  possible  to  relate  the  bound  on  the  depth  of  reflexive  reasoning  to 
specific  physiological  parameters  and  pointers  to  relevant  data  are  welcome. 

Form  of  rules  that  may  participate  in  reflexive  reasoning:  Using  complexity  theory  it  can  be  shown 
that  during  backward  reasoning  (i.e.,  query  answering)  it  is  not  possible  to  make  use  of  rules  containing 
equality  constraints  among  antecedent  roles  unless  (i)  such  roles  map  to  a  consequent  role  in  the  rule  and  (ii) 
the  consequent  role  get  bound  during  the  query  answering  process.  A  similar  constraint  applies  to  forward 
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(predictive)  reasoning.  These  constraints  predict  that  certain  queries  cannot  be  answered  in  a  reflexive 
manner  even  though  the  corresponding  predictions  can  be  made  reflexively.  For  example,  consider  an  agent 
whose  LTM  includes  the  rule  ‘if  x  loves  y  and  y  loves  z  then  z  is  jealous  of  z’,  and  the  long-term  facts  ‘John 
loves  Mary’  and  ‘Mary  loves  Tom’.  We  predict  that  if  this  agent  is  asked  ‘Is  John  jealous  of  Tom?’,  she 
will  be  unable  to  answer  the  query  in  a  reflexive  manner.  Note  that  the  antecedent  of  the  rule  includes  the 
equality  condition:  the  second  role  of  one  instance  of  ‘loves’  should  equal  the  first  role  of  the  other  instance 
of  ‘love’.  Hence,  answering  this  question  will  require  deliberate  and  conscious  processing  imless  the  relevant 
long-term  facts  are  active  in  the  WMRR  for  some  reason  at  the  time  the  query  is  posed.  However,  an  agent 
who  has  the  above  rule  about  love  and  jealousy  in  its  LTM  would  be  able  infer  ‘John  is  jealous  of  Tom’  in  a 
reflexive  manner,  on  being  ‘told’  ‘John  loves  Mary’  and  ‘Mary  loves  Tom’. 

Conclusion:  The  representational  and  inferential  machinery  of  SHRUTl  is  fairly  general  and  can  be 
applied  to  other  problems  in  cognition  that  require  the  expressive  power  of  n-ary  relations  and  depend  on 
the  rapid  and  systematic  interaction  between  long-term  and  dynamic  structures.  Thus  the  constraints  and 
predictions  discussed  above  may  carry  over  to  other  domains.  For  example,  Henderson  (1994)  has  adopted 
the  SHRUTl  model  to  design  a  parser  of  english  whose  speed  is  independent  of  the  size  of  the  grammar  and 
that  can  recover  the  structure  of  arbitrary  long  sentences  as  long  as  the  dynamic  state  required  to  parse 
the  sentence  does  not  exceed  the  bounds  on  the  parser’s  working  memory.  The  parser’s  limited  working 
memory  explains  a  range  of  linguistic  phenomena  pertaining  to  our  limited  ability  to  deal  with  long  distance 
dependencies,  local  ambiguity,  and  center-embedding.  It  lends  evidence  to  our  belief  that  synchrony  may 
eventually  turn  out  to  be  a  sufliciently  powerful  mechanism  for  representing  dynamic  bindings. 
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